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PREFACE 


This book is ati outgrowth of lectures on the theory of probability 
which the author has given at Stanford University for a number of 
years. At first a short mimeographed text covering only the elementary 
parts of the subject was used for the guidance of students. As time 
went on and the scope of the course was gradually enlarged, the necessity 
arose of putting into the hands of students a more elaborate exposition 
of the most important parts of the theory of probability. Accordingly 
a rather large manuscript was prepared for this purpose. The author 
did not plan at first to publish it, but students and other persons who had 
opportunity to peruse the manuscript were so persuasive that publication 
was finally arranged. 

The book is arranged in such a way that the first part of it, consisting 
of Chapters I to XII inclusive, is accessible to a person without advanced 
mathematical knowledge. Chapters VII and VIII are, perhaps, excep- 
tions. The analysis in Chapter VII is rather involved and a better way 
to arrive at the same results would be very desirable. At any rate, a 
reader who does not have time or inclination to go through all the 
intricacies of this analysis may skip it and retain only the final results, 
found in Section 11. Chapter VIII, though dealing with interesting 
and historically important problems, is not important in itself and may 
without loss be omitted by readers. Chapters XIII to XVI incorporate 
the results of modern investigations. Naturally they are more complex 
and require more mature mathematical preparation. 

Three appendices are added to the book. Of these the second is by 
far the most important. It gives an outline of the famous Tshebysheff- 
Markoff method of moments applied to the proof of the fundamental 
theorem previously established by another method in Chapter XIV. 

No one will dispute Newton's assertion: ^^In scientiis addiscendis 
exempla magis prosunt quam praecepta." But especially is it so in the 
theory of probability. Accordingly, not only are a large number of 
illustrative problems discussed in the text, but at the end of each chapter 
a selection of problems is added for the benefit of students. Some of 
them are mere examples. Others are more difficult problems, or even 
important theorems which did not find a place in the main text. In all 
such cases sufficiently explicit indications of solution (or proofs) are given. 



VI 


F REF ACE 


The book does not go into applications of probability to other sciences. 
To present these applications adequately another volume of perhaps 
larger size would be required. 

No one is more aware than the author of the many imperfections in 
the plan of this book and its execution. To present an entirely satis- 
factory book on probability is, indeed, a difficult task. But even with 
all these imperfections we hope that the book will prove useful, especially 
since it contains much material not to be found in «)ther books on the 
same subject in the English language. 

J. V. Uspensky, 

Stanford University, 

September j 1937 ^ 
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INTEODUCTION TO 
MATHEMATICAL PEOBABILITY 


INTRODUCTION 

Quanto enim minus rationis terminis comprehendi posse 
videhatuTj quae fortuita sunt atque incerta, tanto admira- 
bilior ars censebitur, cui ista quoque subjacent . — 

Chr. Huygens, 

De ratiociniis in ludo aleae. 

1. It is always difficult to describe with adequate conciseness and 
clarity the object of any particular science; its methods, problems, and 
results are revealed only gradually. But if one must define the scope 
of the theory of probability the answer may be this: The theory of 
probability is a branch of applied mathematics dealing with the effects of 
chance. Here we encounter the word chance/^ which is often used in 
everyday language but with rather indefinite meaning. To make clearer 
the idea conveyed by this word, we shall try first to clarify the opposite 
idea expressed in the word necessity/’ Necessity may be logical or 
physical. The statement “The sum of the angles in a triangle is equal 
to two right angles” is a logical necessity, provided we assume the 
axioms of Euclidean geometry; for in denying the conclusion of the 
admitted premises, we violate the logical law of contradiction. 

The following statements serve to illustrate the idea of physical 
necessity: 

A piece of iron falls down if not supported. 

Water boils if heated to a sufficiently high temperature. 

A die thrown on a board never stands on its edge. 

The logical structure of all these statements is the same: When certain 
conditions which may be termed “causes” are fulfilled, a definite effect 
occurs of necessity. But the nature of this kind of necessity is different 
from that of logical necessity. The latter, with our organization of 
mind, appears absolute, while physical necessity is only a result of 
extensive induction. We have never known an instance in which water, 
heated to a high temperature, did not boil; or a piece of iron did not fall 
down; or a die stood on its edge. For that reason we are led to believe 
that in the preceding examples (and in innumerable similar instances) 
the effect follows from its “cause” of necessity. 
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Instead of the term physical necessity’.^ we may introduce the 
abstract idea of ^^natural law.” Thus, it is a ^^natural law” that the 
piece of iron left without support will fall down. Natural laws derived 
from extensive experiments or observations may be called empirical 
laws” to distinguish them from theoretical laws. In all exact sciences 
which have reached a high degree of development, such as astronomy, 
physics, and chemistry, scientists endeavor to build up an abstract and 
simplified image of the infinitely complex physical world — an image 
which can be described in mathematical terms. With the help of 
hypotheses and some artificial concepts, it becomes possible to derive 
mathematically certain laws which, when applied to the world of reality, 
represent many natural phenomena with an amazing degree of accuracy. 
It is true that in the development of the sciences it sometimes becomes 
necessary to recast the previously accepted image of the physical world, 
but it is remarkable that the fundamental theoretical laws even then 
undergo but slight modification in substance or interpretation. 

The chief endeavor of the exact sciences is the discovery of natural 
laws, and their formulation is of the greatest importance to the promotion 
of human knowledge in general and to the extension of our powers over 
natural phenomena. 

Are the events caused by natural laws absolutely certain? No, 
but for all practical purposes they may be considered as certain. It is 
possible that one or another of the natural laws may fail, but such 
failure would constitute a real ^'miracle.” However, granted that the 
possibility of miracles is consistent with the nature of scientific knowledge, 
actually this possibility may be disregarded. 

2. If the preceding explanations throw a faint light upon the concept 
of necessity, it now remains to illuminate by comparison some charac- 
teristic features inherent in the concept of chance.” To say that chance 
is a denial of necessity is too vague a statement, but examples may help 
us to understand it better. 

If a die is thrown upon a board we are certain that one of the six faces 
will turn up. But whether a particular jace will show depends on what 
we call chance and cannot be predicted. Now, in the act of tossing a 
die there are some conditions known to us: first, that it is nearly cubic 
in shape; further, if it is a good die, its material is as nearly as possible 
homogeneous. Besides these known conditions, there are other factors 
influencing the motion of the die which are completely inaccessible to our 
knowledge. First among them are the initial position and the impulse 
imparted by the player's hand. These depend on an ^'act of will”— an 
agent which may act without any recognizable motivation — and therefore 
they are outside the domain of rational knowledge. Second, supposing 
the initial conditions known, the complexity of the resulting motion 
defies any possibility of foreseeing the final result. 
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Another example: If equal numbers of white and black balls, which do 
not differ in any respect *ex^t in color, are concealed in an urn, and we 
draw one of them blindly, it is certain that its color will be either white 
or black, but whether it will be black or white we cannot predict: that 
depends on chance. In this example we again have a set of known 
conditions: namely, that balls in equal numbers are white and black, and 
that they are not distinguishable except in color. But the final result 
depends on other conditions completely outside our knowledge. First, 
we know nothing about the respective positions of the white and black 
balls; second, the choice of one or the other depends on an act of will. 

f it is an observed fact that the numbers of marriages, divorces, births, 
deaths, suicides, etc., per 1,000 of population, in a country with nearly 
settled living conditions and during not too long a period of time, do not 
remain constant, but oscillate within comparatively narrow limits. For 
a given year it is impossible to predict what will be their numbers: that 
depends on chance. For, besides some known conditions, such as the 
level of prosperity, sanitation, and many other things, there are unnum- 
bered factors completely outside our knowledge. 

Many other examples of a similar kind can be cited to illustrate the 
notion of chance. They all possess a common logical structure which 
can be described as follows}*^ event A may materialize under certain 
known or fixed” conditions, but not necessarily; for under the same fixed 
conditions other events jB, C, D, . . . are also possible. The mate- 
rialization of A depends also upon other factors completely outside our 
control and knowledge. Consequently, whether A will materialize or 
not under such circumstances cannot be foreseen; the materialization of 
A is due to chance, or, to express it concisely, A is a contingent event. 

3. The idea of necessity is closely related to that of certainty. Thus 
it is certain” that everybody will die in the due course of time. In 
the same way the idea of chance is related to that of 'probability or liheli-' 
hood. In everyday language, the words “probability” and “probable” 
are used with different shades of meaning. By saying, “Probably it will 
rain tomorrow,” we mean that there are more signs indicating rainy 
weather than fair for tomorrow. On the other hand, in the statement, 
“There is little probability in the story he told us,” the word “proba- 
bility” is used in the sense of credibility. But henceforth we shall use 
the word as equivalent to the degree of credence which we may place 
in the possibility that some contingent event may materialize. The 
“degree of credence” implies an almost instinctive desire to compare 
probabilities of different events, or facts. That such comparison is 
possible one can gather from the followifig examples: 

I live on the second floor and can reach the ground either by using 
I the stairway or by jumping from the window. Either way I might be 
I injured, though not necessarily. How do the probabilities of being 
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injured compare in the two cases? Everyone, no doubt, will say that 
the probability of being injured by jumping from the window is greater’’ 
than' the probability of being injured while walking down the stairway. 
Such universal agreement might be due either to personal experience or 
merely to hearsay about similar experiences of other persons. 

An urn contains an equal number of white and black balls that are 
similar in all respects except color. One ball is drawn. It may be either 
black or white. How do the probabilities of these two cases compare? 
One almost instinctively answers: ^^They are equal.’^ 

Now, if there are 10 white balls and 1 black ball in the urn, what 
about the probabilities of drawing a white or a black ball? Again one 
would say without hesitation that the probability of drawing a white ball 
is greater than that of drawing a black ball. ^ 

-t<rhus, probability appears to be something which admits of comparid 
sons in magnitude, but so far only in the same way as in the intensity of 
pain produced by piercing the skin with needles, t— 

But it is a noteworthy observation that men instinctively try to 
characterize probabilities numerically in a naive and unscientific manner. 
We read regularly in the sporting sections of newspapers, predictions 
that in a coming race a certain horse has two chances against one to 
win over another horse, or that the chances of two football teams are as 
10 to 7, etc. No doubt experts do know much about the respective 
horses and their riders, or the comparative strengths of two competing 
football teams, but their numerical estimates of chances have no other 
merit than to show the human tendency to assign numerical values to 
probabilities which most likely cannot be expressed in numbers. 

It is possible that a man endowed with good common sense and ripe 
judgment can weigh all available evidence in order to compare the 
probabilities of the various possible outcomes and to direct his actions 
accordingly so as to secure profit for himself or for society. But precise 
conclusions can never be attained unless we find a satisfactory w^ay to 
represent or to measure probabilities by numbers, at least in some cases. 

4. As in other fields of knowledge, in attempting to measure proba- 
bilities by numbers, we encounter difficulties that cannot be avoided 
except by making certain ideal assumptions and agreements. In 
geometry (we speak of applied and not of abstract geometry), before 
explaining how lengths of rectilinear segments can be measured, we must 
first agree on criteria of equality of two segments. Similarly, in dealing 
with probability, the first step is to answer the question: When may two 
contingent events be considered as equally probable or, to use a more 
common expression, equally likely? From the statements of Jacob 
Bernoulli, one of the founders of the mathematical theory of probability, 
one can infer the following criterion of equal probability; 
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Two contingent events are considered as equally probable if, after taking 
consMerdtion all relevant evidence y cannot be expected in 

preference to the other. ^ 

Certainly there is some obscurity in this criterion, but it is hardly 
possible to substitute any better one. To be perfectly honest, we must 
admit that there is an unavoidable obscurity in the principles of all the 
sciences in which mathematical analysis is applied to reality. 

The application of Bernoulli’s criterion to particular cases is beset 
with dfficulties and requires good common sense and keen judgment. 
There is much truth in Laplace’s statement: ^^La theorie des probabilites 
n’est au fond que le bon sens reduit au calcuL” 

To elucidate the nature of these difiBLculties, let us consider an urn 
filled with white and black balls, but in unknown proportion. The only 
evidence we have, namely, that there are both white and black balls in 
the urn, in this case appears insufficient for any conclusion about the 
respective probabilities of drawing a white or a black ball. We instinc- 
tively think of the numbers of the two kinds of balls, and, being in 
ignorance on this point, we are inclined to suspend judgment. But if we 
know that white and black balls are equal in number and distributed 
without any sort of regularity, this knowledge appears sufficient to 
assume the equality of the probabilities of drawing a white or a black 
ball. It is possible that, perhaps unconsciously, we are influenced by the 
commonly known fact that if we repeatedly draw a ball out of the urn 
many times, returning the ball each time before drawing again, the white 
and the black balls appear in nearly equal numbers. 

If an urn contains a certain number of identical balls distinguished 
from one another by some characteristic signs, for example, by the 
numbers 1, 2, 3, ... , the knowledge that the balls are identical and 
are distributed without regularity suffices in this case to cause us to 
conclude that the probabilities for drawing any of the balls should be 
considered as equal. Again, in so readily assuming this conclusion we 
may be influenced by the fact empirically observed (by ourselves or by 
others) that in a long series of drawings, with balls being restored to 
the urn after each withdrawal, the balls appear with nearly the same 
frequency. 

An ordinary die is tossed. Should we consider the possible numbers 
of points 1, 2, 3, 4, 5, 6 as equally probable? To pronounce any judg- 
ment, we must know something about the die. If it is known that the 
die has a regular cubic shape and that its material is homogeneous, we 
readily agree on the equal probabilities of all the numbers of points 
1, 2, 3, 4, 5, 6. And this a priori conclusion, based on Bernoulli’s cri- 
terion, agrees with the observed fact that each number of points does 
appear nearly an equal number of times in a long series of throws, if the 
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die is a good one. However, if we only know that the die has a regular 
shape, but not whether or not it is loaded, it is only sensible to suspend 
judgment. 

These examples vshow that before trying to apply Bernoulli’s criterion, 
we must have at our disposal some evidence the amount of which cannot 
be determined by any general rules. It may be also that the reason a 
priori must be supplemented by some empirical evidence. In some 
cases, lacking sufficient grounds to assert equal probabilities for two 
events, we may assume them as a hypothesis, to be kept until for some 
reason we are forced to abandon it. 

6. Besides the ticklish question: When are we entitled to consider 
events as equally probable? there is another fundamental assumption 
required to make possible the measurement of probabilities by numbers. 

Events ai, a 2 , . . . a» form an exhaustive set of possibilities under 
certain fixed conditions S, if at least one of them must necessarily mate- 
rialize. They are mutually exclusive if any two of them cannot material- 
ize simultaneously. The fundamental assumption referred to consists in 
the pos^sibility of subdividing results consistent with the conditions S 
into a number of exhaustive, mutually exclusive, and equally likely 
events, or cases (as they are commonly called) : 

This being granted, the probability of any one of these cases is assumed 
to be l/n. 

An event A may materialize in several mutually exclusive particular 
forms: a, jd, . . . that is, if A occurs, then one and only one of the 
events a, jd, . . , X occurs also, and conversely the occurrence of one of 
these events necessitates the occurrence of A, Thus, if A consists in 
drawing an ace from a deck of cards, A may materialize in four mutually 
exclusive forms: as an ace of hearts, diamonds, clubs, or spades. 

Let an event A be represented by its particular forms ai, , , , a^, 
which together with other events a,n+i, 0 ,^+ 2 , an constitute an 
exhaustive set of mutually exclusive and equally likely cases consistent with 
the conditions S. Events ai,a 2 , . . . a^arecalled^' cases favorable to A.” 

Definition of Mathematical Probability. //, consistent with conditions 
S, there are n exhaustive, mutually exclusive, and equally likely cases, and 
m of them are favorable to an event A, then the mathemaiical 'probability of 
A is defined as the ratio m/n. 

In drawing a card from a full deck there are 52 and no more mutually 
exclusive and equally likely cases; 4 of them are favorable for drawing an 
ace; hence the probability of drawing an ace*is M 2 = Ma* 

From an urn containing 10 white, 20 black, and 5 red balls, one ball is 
drawn. Here, distinguishing individual balls, we have 35 equally likely 
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cases. Among them there are 10, 20, and 5 cases, favorable respectively 
to a white, a black, or a red ball. Hence the probabilities of drawing a 
white, a black, or a red ball are, respectively, and >7. 

In the first example, instead of 52 cases, we may consider only 13 
cases according to the denominations of the cards. These cases being 
regarded as equally likely, there is only one of them favorable to an 
ace. The probability of drawing an ace is 3^3. This observation makes 
it clear that the subdivision of all possible results into equally likely 
cases can be done in various ways. To avoid contradictory estimations 
of the same probability we must always observe the following rules: 

Two events are equally likely if each of them can be represented by 
equal numbers of equally likely forms. 

Two events are not equally likely if they are represented by unequal 
numbers of equally likely forms. 

Thus, if two equally likely events are each represented by different 
numbers of their respective forms, then the latter cannot be considered as 
equally likely. 

Each card is characterized by its denomination and the suit to which 
it belongs. Noting denominations, we distinguish 13 cases, but each 
of these is represented by 4 new cases according to the suit to which the 
card belongs. Altogether we have, then, 52 cases recognized as equally 
likely; hence, the above-mentioned 13 cases should be considered as 
equally likely. 

In connection with the definition of mathematical probability, 
mention should be made of an important principle not always explicitly 
stated. If 


(Xlf Cl2} • » • • » ‘ 

are all mutually exclusive and equally likely cases consistent with 
certain conditions, and the indication of the occurrence of an event B 
makes cases 61, ^2, . * . 5^, impossible, cases ai, a2, . . . still should be 
considered as equally likely. To illustrate this principle, consider an 
urn with six tickets bearing numbers 1, 2, ... 6. Two tickets are 
drawn in succession. If nothing is known about the number of the first 
ticket, we still have six possibilities for the number of the second ticket, 
which we agree to consider as equally likely. But as soon as the number 
of the first ticket becomes known, then there are only five cases left 
concerning the number of the second ticket. According to the above 
principle we must consider these five’ cases as equally likely. 

Probability as defined above is represented by a number contained 
between 0 and 1. In the extreme case in which the probability is 0, it 
indicates the impossibility of an event. On the contrary, in the other 
extreme case in which the probability is 1, the event is certain. When 
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the probability is expressed by a number very near to 1, it means that 
the overwhelming majority of cases are favorable to the event. On the 
contrary, a probability near to 0 shows that the proportion of favorable 
cases is small. 

From our experience we know that events with a small probabil- 
ity seldom happen. For instance, if the probability of an event is 
1/1,000,000, the situation may be likened to the drawing of a white ball 
from an urn containing 999,999 black balls and a single white one. 
This white ball is practically lost among the majority of black balls, and 
for all practical purposes we may consider its extraction impossible. 
Similarly, the probability 999,999/1,000,000 may be considered, from a 
practical standpoint, as an indication of certainty. What limit for 
smallness of probability is to be set as an indication of practical impos- 
sibility? Evidently there is no general answer to this question. Every- 
thing depends on the risk we can face if, contrary to expectation, an 
event with a small probability should occur. Hence, the main problem 
of the theory of probability consists in finding cases in which the proba- 
bility is very small or very near to 1. Instead of saying, ^^The proba- 
bility is very near to we shall say, great probability,^^ although, 
of course, the probability can never exceed 1. 

7. The definition of mathematical probability in Sec. 5 is essentially 
the classical definition proposed by Jacob Bernoulli and adopted by 
Laplace and almost all the important contributors to the theory of 
probability. But, since the middle of the nineteenth century (Cournot, 
John Stuart Mill, Venn), and especially in our days, the classical definition 
has been severely criticized. Several attempts have been made to rear 
up the edifice of the mathematical theory of probability on quite a 
different definition of mathematical probability. It does not enter into 
our plan to criticize these new definitions, but, in the opinion of the 
author, many of them are self-contradictory. Modern attempts to build 
up the theory of probability as an axiomatic science may be interesting 
in themselves as mental exercises; but from the standpoint of applica- 
tions the purely axiomatic science of probability would have no more 
value than, for example, would the axiomatic theory of elasticity. 

The most serious objection to the classical definition is that it can 
be used only in very simple and comparatively unimportant cases like 
games of chance. This objection, stressed by von Mises, is in reality 
not a new one. It is one of the objections Leibnitz made against Jacob 
Bernoulli's views concerning the possibility of applications of the theory 
of probability to various important fields of human endeavor and not 
merely to games of chance. 

■ It is certainly true that the classical definition cannot be directly 
applied in many important cases. . But is it the fault of the definition 
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or is it rather due to our ignorance of the innermost mechanisms which, 
apart from chance, contribute to the materialization or nonmaterializa- 
tion of contingent events? It seems that this is what Jacob Bernoulli 
meant in his reply to Leibnitz: 

Objiciunt primo, aliam esse rationem calculorum, aliam morborum aut muta- 
tionum aeris; illorum numerum determinatum esse, horum indeterminatum et 
vagum. Ad quod respondeo, utrumque respectu cognitionis nostrae aequi poni 
incertum et indeterminatum; sed quicquam in se et sua natura tale esse, non 
magis a nobis posse concipi, quam concipi potest, idem simul ab Auctore naturae 
creatum esse et non creatum: quaecumque enim Deus fecit, eo ipso dum fecit, 
etiam determinavit.^ 

8. A brilliant example of how the profound study of a subject finally 
makes it possible to apply the classical definition of mathematical 
probability is afforded in the fundamental laws of genetics (a science of 
comparatively recent origin, whose importance no one can deny), dis- 
covered by the Augustinian monk, Gregor Mendel (1822-1884). During 
eight years MendeP conducted experimental work in crossing different 
varieties of the common pea plant with the purpose of investigating how 
pairs of contrasting characters were inherited. For the pea plant there 
are several pairs of such contrasting characters : round or wrinkled seeds, 
tallness or dwarfness, yellow or green pod color, etc. Let us concentrate 
our attention on a definite pair of contrasting characters, yellow or green 
pod color. Peas with green pod color always breed true. Also some 
peas with yellow color always breed true, while still others produce both 
varieties. True breeding pea plants constitute two pure races : A with 
yellow pod color and B with green pod color, while plants with yellow 
pods not breeding true constitute a hybrid race, C. Crossing plants of 
the race A with those of the race B and planting the seeds, Mendel 
obtained a first generation Fi of hybrids. Letting plants of the first 
generation self-fertilize and again planting their seeds to produce the 
second generation F 2 , Mendel found that in this generation there were 
428 yellow pod plants and 152 green pod plants in the ratio 2.82:1. 
In regard to other contrasting characters the ratio of approximately 3 : 1 
was observed in all cases. Later experimental work only confirmed 
MendeFs results. Thus, combined experiments of Correns, Tschermak, 
and others gave among 195,477 individuals of F 2 , 146,802 yellow pod 
plants and 48,675 green pod plants, in the ratio 3.016: 1. 

^To understand the beginning of this statement see the translation from ^‘Ars 
conjectandi^' in Chap. VI, p. 105. 

2 Mendel’s results were published in 1865, but passed completely unnoticed until 
in about 1900 the same facts were rediscovered by DeVries, Correns, and Tschermak. 
Modern genetics dates from about this time. 
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Mendel not only discovered such remarkable regularities, but also 
suggested a rational explanation of the observed ratio 3:1, which with 
some modifications is accepted even today. Bodies of plants and 
animals are built up of enormous numbers of cells, among which the 
reproductive cells, or gametes, differ from the remaining somatic^’ 
cells in some important qualities. Cells are not homogeneous, but 
possess a definite structure. In somatic cells there are found bodies, 
called chromosomes, whose number is even and the same for the same 
species. Exactly half of this number of chromosomes is found in repro- 
ductive cells. Chromosomes are supposed to be seats of hypothetical 
^^genes,^^ which are considered as bearers of various heritable characters. 
A chromosome of one pure race A bearing a character a differs from the 
homologous chromosome of another pure race B bearing a contrasting 
character h in that they contain genes of different kinds. Since characters 
a and h are borne by definite chromosomes, the situation in regard to the 
two characters a and h is exactly the same as if gametes of both races 
contained just one chromosome. Let us represent them symbolically by 
O and In the act of fertilization a pair of paternal and maternal 
gametes conjugate and form a zygote, which by division and growth 
produces all cells of the filial generation. Certain of these cells become 
the germ cells and are set apart for the formation, by a complicated 
process, of gametes, one half of which contain the chromosome of the 
paternal type and the other half that of the maternal type. 

According to this theory, in crossing two individuals belonging to 
races A and B, zygotes of the first generation Fi will be of the type 
O- — and will produce gametes, in equal numbers, of the types 0,0. 
Now if two individuals of (hybrids) are crossed (or one individual 
self-fertilized as in the cases of some plants), one paternal gamete con- 
jugates with one maternal, and for the resulting zygote there are four 
possibilities: 

0—0 0—0 0-0 0—0 

These possibilities may be considered as equally probable, whence 
the probabilities for an individual of the generation F 2 to belong respec- 
tively to the races A, £, C are M- Similarly, one easily finds that 

in crossing an individual of the race A with one of the hybrid race C, 
the probabilities of the offspring belonging to A or (7 are both equal to 
It is easy now to offer a rational explanation of the Mendelian ratio 
3 :1. In the case of pea plants, individuals of the race A and hybrids 
are not distinguishable in regard to the color of their pods. Hence the 
probability of the offspring of a hybrid plant having yellow pods is 
while for the offspring to have green pods the probability is K. 
When the generation i ^2 consists of a great many individuals, the theory 
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of probability shows that the ratio of the number of yellow pod plants to 
the number of green pod plants is not likely to differ much from the ratio 
3:1. In crossing plants of the race A with hybrids, the offspring, if 
numerous, will contain plants of race A or C, respectively, in a proportion 
which is not likely to differ much from 1:1. And this conclusion was 
experimentally verified by Mendel himself. 

9. If in the case of the Mendelian laws the profound study of the 
mechanism of heredity together with hypothetical assumptions of the 
kind used in physics, chemistry, etc., paved the way for a rational 
explanation of observed phenomena on the basis of the theory of proba- 
bility, in many other important instances we are still unable to reach the 
same degree of scientific understanding. Stability of statistical ratios 
observed in many cases suggests the idea that they should be explained 
on the basis of probability. For instance, it has been observed that 
the ratio of human male and female births is nearly 51:50 for large 
samples, and this is largely independent of climatic conditions, racial 
differences, living conditions in different countries, etc. Although the 
factors determining sex are known, yet some complications not suflS.- 
ciently cleared up prevent estimation of probabilities of male and female 
births. 

In all instances of the pronounced stability of statistical ratios we 
may believe that some day a way will be found to estimate probabilities 
in such cases. Therefore many applications of the theory of probability 
to important problems of other sciences are based on belief in the existence 
of the probabilities with which we are concerned. In other cases in 
which the theory of probability is used, we may have grave doubts 
as to whether this science is applied legitimately. The fact that many 
applications of probability are based on belief or faith should not dis- 
courage us; for it is better to do something, though it may be not quite 
reliable, than nothing. Only we must not be overconfident about the 
conclusions reached under such circumstances. 

After all, is not faith at the bottom of all scientific knowledge? 
Physicists speak of electrons, which never have been seen and are known 
only through their visible manifestations. Electrons are postulated 
just to coordinate into a coherent whole a large variety of observed 
phenomena. Is not this faith? It must be, for according to Paul 
(Hebrews,* 11 : 1), “Faith is the substance of things hoped for, the evidence 
of things not seen.” 

10. In concluding this introduction it remains to give a short account 
of the history of the theory of probability. Although ancient philoso- 
phers discussed at length the necessity and contingency of things, it 
seems that mathematical treatment of probability was not known to the 
ancients. Apart from casual remarks of Galileo concerning the correct 
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evaluation of chances in a game of dice, we find the true origin of the 
science of probability in the correspondence between two great men of 
the seventeenth century, Pascal (1623-1662) and Fermat (1601-1665). 
A French nobleman, Chevalier de M4re, a man of ability and great 
experience in gambling, asked Pascal to explain some seeming contradic- 
tions between his theoretical reasoning and the observations gathered 
from gambling. Pascal solved this difficulty and attacked another 
problem proposed to him by de Mere. On hearing from Pascal about 
these problems, Fermat became interested in them, and in their private 
correspondence these two great men laid the first foundations of the 
science of probability. Bertrand^s statement, “Les grands noms de 
Pascal et de Fermat decorent le berceau de cette science^^ cannot be 
disputed- 

Huygens (1629-1695), a great Dutch scientist, became acquainted 
with the contents of this correspondence and, spurred on by the new 
ideas, published in 1654 a first book on probability, ^^De ratiociniis in 
ludo aleae,^^ in which many interesting and rather difficult problems on 
probabilities in games of chance were solved. To him we owe the 
concept of ^^mathematical expectation^' so important in the modern 
theory of probability. 

Jacob Bernoulli (1654-1705) meditated on the subject of probability 
for about twenty years and prepared his great book, Ars conjectandi," 
which, however, was not published until eight years after his death in 
1713, by his nephew, Nicholas Bernoulli. Bernoulli envisaged the 
subject from the most general point of view, and clearly foresaw a whole 
field of applications of the theory of probability outside of the narrow 
circle of problems relating to games of chance. To him is due the 
discovery of one of the most important theorems known as ^^Bernoulli's 
theorem." 

The next great successor to Bernoulli is Abraham de Moivre (1667- 
1754), whose most important work on probability, ^^The Doctrine of 
Chances," was first published in 1718 and twice reprinted in 1738 and 
in 1756. De Moivre does not contribute much to the principles, but this 
work is justly renowned for new and powerful methods for the solution 
of more difficult problems. Many important results, ordinarily attrib- 
uted to Laplace and Poisson, can be found in de Moivre's book. 

Laplace (1749-1827), whose contributions to celestial mechanics 
assured him everlasting fame in the history of astronomy, was very 
much interested in the theory of probability from the very beginning of 
his scientific career. After writing several important memoirs on the 
subject, he finally published, in 1812, his great work ''Thdorie analytique 
des probabilit^s," accompanied by a no less known popular exposition, 
“Essai philosophique sur les probabilitds," destined for the general 
educated public. Laplace^s work, on account of the multitude of new 
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ideaS; new analytic methods, and new results, in all fairness should be 
regarded as one of the most outstanding contributions to mathematical 
literature. It exercised a great influence on later writers on probability 
in Europe, whose work chiefly consisted in elucidation and development 
of topics contained in Laplace’s book. 

Thus in European countries further development of the theory of 
probability was somewhat retarded. But the subject took on important 
developments in the works of Russian mathematicians: Tshebysheff 
(1821-1894) and his former students, A. Markoff (1856-1922) and A. 
Liapounoff (1858-1918). Castelnuovo in his fine book ^^Calcolo delle 
probabilita” rightly regards the contributions to the theory of probability 
due to Russian mathematicians as the most important since the time of 
Laplace. 

At the present time interest in the theory of probability is revived 
everyv^here, but again the most outstanding recent contributions have 
been made in Russia, chiefly by three prominent mathematicians: S. 
Bernstein, A. Khintchine, and A. Kolmogoroff. 

In closing this introduction it seems proper to quote the closing 
words of the “Essai philosophique sur les probabilit^s” : 

On voit par cet Essai, que la th^orie des probabilit^s est au fond, que le bon 
sens r6duit au calcul: elle fait appr4cier avec exactitude, ce que les dsprits justes 
sentent par une sorte d Instinct, sans quTs puissent sou vent s’en rendre compte. 
Bile ne laisse rien diarbitraire dans le choix des opinions et des partis a prendre, 
toutes les fois que Ton peut, 4 son moyen, determiner le choix le plus avantageux. 
Par la, elle devient le supplement le plus heureux, h Tignorance et a la faiblesse 
de resprit humain. Si Ton considere les methodes analytiques auxquelles cette 
theorie a donne naissance, la verite des principes qui lui servent de base, la 
logique fine et delicate qu’ exige leur emploi dans la solution des problemes, les 
etablissements d^utilite pubhque qui s’appuient sur elle, et Textension qu’elie a 
regue et qu^elle peut re 9 evoir encore, par son application aux questions les plus 
importantes de la Philosophic naturelle et des sciences morales; si Ton observe 
ensuite, que dans les choses m^mes qui ne peuvent etre soumise au calcul, elle 
donne les apergus les plus stirs qui puissent nous guider dans nos jugements, 
et qu'elle apprend k se garantir des illusions qui souvent nous 4garent,* on verra 
qufll n'est point de science plus digne de nos meditations. 
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CHAPTER I 


COMPUTATION OF PROBABILITIES BY DIRECT 
ENUMERATION OF CASES 


1. The probability of an event can be found by direct application 
of the definition when it is possible to make a complete enumeration of 
all equally likely cases, as well as of those favorable to that event. Here 
we shall consider a few problems, beginning with the simplest, to illustrate 
this direct method of evaluating probabilities. 

Problem 1. Two dice are thrown. What is the probability of 
obtaining a total of 7 or 8 points? 

Solution. Suppose we distinguish the dice by the numbers 1 and 2. 
There are 6 possible cases as to the number of points on the first die; 
and each of these cases can be accompanied by any of the 6 possible 
numbers of points on the second die. Hence, we can distinguish alto- 
gether 6 X 6 = 36 different cases. Provided the dice are ideally regular 
in shape and perfectly homogeneous, we have good reason to consider 
these 36 cases as equally likely, and we shall so consider them. 

Next, let us find out how many cases are favorable to the total of 
7 pcfiats. This may happen only in the following ways: 


First Die 
1 
2 

3 

4 

5 

6 


Second Die 
6 
5 
4 
3 
2 
1 


Likewise, for 8 points: 


First Die 
2 

3 

4 

5 

6 


Second Die 
6 
5 
4 
3 
2 


That is, out of the total number of 36 cases there are 6 cases favorable 
to 7 points and 5 cases favorable to 8 points; hence, the probability of 
obtaining 7 points is and the probability of obtaining 8 points is 

Problem 2. A coin is tossed three times in succession. What 
is the probability of obtaining 2 heads? What is the probability of 
obtaining tails at least once? 


14 
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Solution, In the first throw there are two possible cases: heads or 
tails. And if the coin is unbiased (which we assume is true) these two 
cases may be considered as equally likely. In two throws there are 
2X2 = 4 cases; namely, both of the two possible cases in the first toss 
can combine with both of the possible cases in the second. Similarly, 
in three throws the number of cases will be 2 X 2 X 2 = 8. To find 
the number of cases favorable to obtaining 2 heads, we must consider 
that this can happen only in three ways: 

Heads Heads Tails 
Heads Tails Heads 
Tails Heads Heads 

The number of favorable cases being 3, the probability of obtaining 
two heads is 

^To answer the second part of the question, we observe that there is 
only one case when tails does not turn up. Therefore, the number of 
cases favorable to obtaining tails at least once is 8 ~ 1 = 7, so that 
the required probability is 

3, Problem 3. Two cards are drawn from a deck of well-shuffied 
cards. What is the probability that both the extracted cards are 
aces? 

Solution. Since there are 52 cards in the deck, there are 52 ways 
of extracting the first card. After the first card has been withdrawn, 
the second extracted card may be one of the remaining 51 cards. There- 
fore, the total number of ways to draw two cards is 52 X 51. All these 
cases may be considered as equally likely. 

To find the number of cases favorable to drawing aces, we observe 
that there are 4 aces; therefore, there are 4 ways to get the first ace. 
After it has been extracted, there are 3 ways to get a second ace. Hence, 
the total number of ways to draw 2 aces, is 4 X 3, and the required 
probability is : 

^ X 3 ^ 1 J:_. CK 

52 X 51 13 X 17 221 

Problem 4. Two cards are drawn from a full pack, the first card 
being returned to the pack before the second is taken. What is the 
probability that both the extracted cards belong to a specified suit? 

Solution. There are 52 ways of getting the first card. For the 
second drawing, there are also 52 ways, because by returning the first 
extracted card to the pack, the original number was restored. Under 
such circumstances, the total number of ways to extract two cards is 
52 X 52. Now, because there are 13 cards in a suit, the number of 
cases favorable to obtaining two cards of a specified suit is 13 X 13. 
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Therefore, the required probability is given by: 

13 X 13 __ 1 X 1 _ 1 
52 X 52 4 X 4 16* 

4. Problem 6. An urn contains 3 white and 5 black balls. One 
ball is drawn. What is the probability that it is black? 

Solution. The total number of balls is 8. To distinguish them, we 
may imagine that they are numbered. As to the number on the ball 
drawn, there are 8 possible cases that may reasonably be considered as 
equally likely. Obviously, there are 5 cases favorable to the black color 
of the ball drawn. Therefore, the required probability is 

By a slight modification of the last problem, we come to the following 
interesting situation : 

Problem 6. The contents of the urn are the same as in the foregoing 
problem. But this time we suppose that one ball is drawn, and, its color 
unnoted, laid aside. Then another ball is drawn, and we are required to 
find the probability that it is black or white. 

Solution. Suppose again that the balls are numbered, so that the 
white balls bear numbers 1, 2, and 3; and the black balls bear numbers 
4, 5, 6, 7, '8. Obviously, there are 8 ways to get the first ball, and what- 
ever it is, there remain only 7 ways to get the second ball. The total 
number of equally likely cases is 8 X 7 = 56. 

It is a little more difficult to find the number of cases favorable to 
extracting a white or black ball in the second drawing. Suppose we are 
interested in the white color of the second ball. If the first ball drawn is 
a white one, it may bear one of the numbers 1 to 3. Whatever this 
number is, the second ball, if it is white, can bear only the two remaining 
numbers. Therefore, under the assumption that the first ball is a white 
one, the number of favorable cases is 3 X 2 = 6. Again, supposing that 
the first ball drawn is black, we have 5 possibilities as to its number, and, 
corresponding to any one of these possibilities, there are 3 possibilities 
as to the number of the white ball to be taken in the second drawing, 
so that the number of favorable cases now is 5 X 3 = 15. The number 
of all favorable cases is 6 + 15 = 21. The required probability for 
the white ball is the same way, we should find 

that the probability for the black ball is It is remarkable that 
these two probabilities are the same as if only a single ball had been 
drawn. 

The situation is quite different if we know the color of the first ball. 
Suppose, for instance, that it is white. The total number of equally 
likely cases will then be 3 X 7 = 21; and the number of cases favorable 
to getting apother white ball is 3 X 2 = 6, so that the probability in 
this case' is ; ^ 
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This last example shows clearly how much probability depends upon 
a given or known set of conditions. 

6. Problem 7. Three boxes; identical in appearance^ each have two 
drawers. The first box contains a gold coin in each drawer; the second 
contains a silver coin in each drawer; but the third contains a gold coin 
in one drawer and a silver coin in the other, (a) A box is chosen at ran- 
dom. What is the probability that it contains coins of different metals? 
(b) A box is chosen, one of its drawers opened, and a gold coin found. 
What is the probability that the other drawer contains a silver coin? 

Solution, (a) Since nothing outwardly distinguishes one box from 
the other, we may recognize three equally likely cases, and among them 
is only one case of a box with coins of different metals. Therefore, we 
estimate the required probability as M- 

^(b) As to the second question, one is tempted to reason as follows: 
The fact that a gold coin was found in one drawer leaves only two 
possibilities as to the content of the other drawer; namely, that the coin 
in it is either gold or silver. Hence, the probability of a silver coin in 
the second drawer seems to be But this reasoning is fallacious. 
It is true that, when the gold coin is found in one drawer, there are only 
two possibilities left as to the content of the other drawer; but these 
possibilities cannot be considered as equally likely. To see this point 
clearly, let us distinguish the drawers of the first box by the numbers 1 
and 2; those of the second box by the numbers 3 and 4; finally, in the 
third box, 5 will distinguish the drawer containing the silver coin, while 
6 will represent the drawer with the gold coin. 

Instead of three equally likely cases: 

box 1, box 2, box 3 

we now have six cases: 

drawers 1, 2; drawers 3, 4; drawers 5, 6, 

which, with reference to the fundamental assumptions, must be con- 
sidered as equally likely. If nothing were known about the contents 
of the drawer which has been opened, the number of this drawer might be 
either 1, 2, 3, 4, 5, or 6. But as soon as the gold coin is discovered in it, 
cases 3, 4, and 5 become impossible, and there remain three equally likely 
assumptions as to the number of the opened drawer: it may be either 1 or 
2 or 6. That leaves three cases, and in only one of them, namely, in 
case 6, will the other drawer contain a silver coin. Thus the answer 
to the second question (6) is 3^. 

6. In the preceding problems the enumeration of cases did not 
present any difficulty. We are now going to discuss affew problems in 
whi^^tEs enumeration is not so obvious but can be greatly simplified 
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by the use of well-known formulas for the number of permutations, 
arrangements, and combinations. 

Let m distinct objects be represented by the letters a, 6, c, , . . L 
Using all these objects, we can place them in different orders and form 
permutations.^^ For instance, if there are only three letters, a, 6, and c, 
all the possible permutations are: abc, ac6, bac, hca^ cab, cha, — 6 different 
permutations out of 3 letters. In general, the number of permutations 
Pm of m objects is expressed by 

== 1 * 2 • 3 ' ' ' m= ml 

If n objects are taken out of the total number of m objects to form 
groups, attention being paid to the order of objects in each group, then 
these groups are called arrangements.^^ For instance, by taking two 
letters out of the four letters a, 6, c, d, we can form the following 12 
arrangements: 

ah ha ca da 
ac he cb dh 
ad hd cd dc 


Denoting by the symbol the number of arrangements of m 
objects taken n at a time, the following formula holds: 


A^ = m(m — l){m — 2) 


{m — n + 1). 


Again, if we form groups of n objects taken out of the total number of 
m objects, this time paying no attention to the order of objects in the 
group, we form combinations.^^ For instance, ■ following are the 
different combinations out of 5 objects taken 3 at a time: 

ahe abd ahe acd ace 
ade bed hce hde ede 

In general, the number of combinations out of m objects taken n 
at a time, which is usually denoted by the symbol C”, is given by 

_ m(m ~ l)(m — 2) • • • (m — n + 1) 

1 •2*3 — • n 

It is useful to recall that the same expression may itlso be exhibited 
as follows: 




ml 


n!(m — n)! 

whence, by substituting m — n instead of n, the useful formula 


Cl = 




can be derived. 
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7. After these preliminary remarks, we can turn to the problems in 
which the foregoing formulas will often be used. 

Problem 8. An urn contains a white balls and h black balls. If 
a + ^ balls are drawn from this urn, find the probability that among 
them there will be exactly a white and ^ black balls. 

Solution. If we do not distinguish the order in which the balls come 
out of the urn, the total number of ways to get a + ^ balls out of the 
total number a + 6 balls is obviously expressed by and this is 
the number of all possible and equally likely cases in this problem. The 
number of ways to draw a white balls out of the total number a of white 
balls in the urii is and similarly represents the number of ways 
of drawing ^ black balls out of the total number I of black balls. Now 
every group of a white balls combines with every possible group of 
black balls to form the total of a white balls and black balls, so that 
the number of ways to form all the groups containing a white balls and 
P black balls is * Cf. This is also the number of favorable cases; 
hence, the required probability is 



or, in a more explicit form, 


( 1 ) 


l-2--(a + g) , 

^ 1*2 • ‘ • oj-1-2 • • • jS 

a(a — 1) • • • (g — g + 1) • ^ 1) • * ‘ (b — + 1) 

(a -f- 6)(a -f- & — 1) * * * (a -|- 6 — ol — /3 4” 1) 


Problem 9. An urn contains n tickets bearing numbers from i to n, 
and m tickets are drawn at a time. What is the probability that i of 
the tickets removed have ntimbers previously specified? 

Solution. This problem does not essentially differ from the preceding 
one. In fact, i tickets with preassigned numbers can be likened to i 
white balls, while the remaining tickets correspond to the black balls. 
The required probability, therefore, can be obtained from the expression 
(1) by taking a = 6 = n ~ i, a = ^, y? = m ~ i and, all simplifications 

performed, will be given by 


( 0 \ ^ - 1) • • • (to - z + 1) 

^ ^ ^ n{n — 1) • • * (n — i + 1) 

The conditions of this problem were realized in the French lottery, 
which was operated by the French royal government for a long time but 
discontinued soon after the Revolution of 1789. Similar lotteries 
continued to exist in other European countries throughout the nineteenth 
century. In the French lottery, tickets bearing numbers from 1 to 90 
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were sold to the people, and at regular intervals drawings for winning 
numbers were held in diflPerent French cities. At each drawing, 5 
numbers were drawn. If a holder of tickets won on a single number, 
he received 15 times its cost to him. If he won on two, three, four, or 
five tickets, he could claim respectively 270, 5,500, 75,000, and, finally, 
1,000,000 times their cost to him. 

The numerical values of the probabilities corresponding to these 
different cases are worked out as follows: we must take n = 90, m = 5, 
and i = 1, 2, 3, 4, or 5 in the expression (2). The results are 


Single ticket 


A = i_ 

90 18‘ 


Two tickets 
Three tickets 
Four tickets 


5*4 
90-89 
5-4-3 
90 • 89 - 88 
5-4-3-2 
90 • 89 - 88 • 87 
5-4-3-2- 1 


Five tickets 


90 - 89 • 88 • 87 • 86 


2 

8or 

1 

11748' 

1 

511038' 

1 

43949268' 


8. Problem 10. From an urn containing a white balls and h black 
ones, a certain number of balls, fc, is drawn, and they are laid aside, their 
color unnoted. Then one more ball is drawn; and it is required to find 
the probability that it is a white or a black ball. 

Solution. Suppose the k balls removed at first and the last ball 
drawn are laid on fc + 1 different places, so that the last ball occupies 
the position at the extreme right. The number of ways to form groups 
of A: + 1 balls out of the total number of a + 5 balls, attention being 
paid to the order, is 

(a -f- 5) (a "T 5 — 1) • * * (a “j- 6 — ^ jfe). 

Such is the total number of cases in this problem, and they may all be 
considered as equally likely. To find the number of cases favorable to 
a white ball, we observe that the last place should be occupied by one of 
the a white balls. Whatever this white ball is, the preceding k balls 
form one of the possible arrangements out of a + 6 — 1 remaining balls 
taken k at a time. Hence, it is obvious that the number of cases favorable 
to a white ball is 

a{a + 6 — 1) • • + h -- k)^ 

and therefore the required probability is given by 


a 


a + h 
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.for a white ball. In a similar way we find the probability 6/(a + b) of 
drawing a black ball. These results show that the probability of getting 
white or black balls in this problem is the same as if no balls at all were 
removed at first. Here we have proof that the peculiar circumstances 
observed in Prob. 6 are general. 

9. Problem 11. Two dice are thrown times in succession. What is 
the probability of obtaining double six at least once? 

Solution. As there are 36 cases in every throw and each case of the 
first throw can combine with each case of the second throw, and so on, 
the total number of cases in n throws will be 36"^. Instead of trying to 
find the number of favorable cases directly, it is easier to find the number 
of unfavorable cases; that is, the number of cases in which double sixes 
would be excluded. In one throw there are 35 such cases, and in n throws 
there will be 35^. Now, excluding these cases, we obtain 36^ — 35” 
favorable cases; hence, the required probability is 


p = 1 - (M)”- 


If one die were thrown n times in succession, the probability to obtain 
6 points at least once would be 


p = 1 - (I)”- 

Now, suppose we want to find the number of throws sufficient to 
assure a probability > 3^ of obtaining double six at least once. To this 
end we must solve the inequality 


< 4 


for n; whence we find 


,1 


i 


n > 


log 2 


log 36 — log 35 


= 24.6 


It means that in 25 throws there is more likelihood to obtain double 
six at least once than not to obtain it at all. On the other hand, in 
24 throws, we have less chance to succeed than to fail. 

Now, if we dealt with a single die, we should find that in 4 throws 
there are more chances to obtain 6 points at least once than there are 
chances to fail. * 

This problem is interesting in a historical respect, for it was the first 
problem on probability solved by Pascal, who, together with his great 
contemporary Fermat, had laid the first foundations of the theory of 
probability. This problem was suggested to Pascal by a certain French 
nol^leman, Chevalier de Mere, a man of great experience in gambling. 
He had observed, the advantage of betting for double six in 25 throws 
and for one six (with a single die) in 4 throws. He found it difficult to 


f 
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understand because, he said, there were 36 cases for two dice and 6 cases 
for one die in each throw, and yet it is not true that 25:4 = 36:6. Of 
course, there is no reason for such an arbitrary conclusion, and the cor- 
rect solution as given by Pascal not only re^^^^^^^ any apparent paradoxes ^ 
in this case, but it led to the same number, 26, observed by gamblers f!i 
their daily experience. 

10. Problem 12, A certain number n of identical balls is distributed 
among N compartments. What is the probability that a certain speci- 
fied compartment will contain h bails? 

Solution. To find the number of all possible cases in this problem, 
suppose that we distinguish the balls by numbering them from 1 to n. 
The ball with the number 1 may fall into any of the N compartments, 
which gives N cases. The ball with the number 2 may also fall into any 
one of the N compartments; so that the number of cases for 2 balls will 
heN ' N = Likewise, for 3 balls the number of cases will be 

• A = N\ 

and for any number n of balls the number of cases will be To find 
the number of favorable cases, first suppose that a group of h specified 
balls falls into a designated compartment. The remaining n — h balls may 
be distributed in any way among iV — 1 remaining compartments. But 
the number of ways to distribute n — h balls among iV* — 1 compart- 
ments is (N “ 1)”“"^ and this becomes the number of all favorable cases 
in which a specified group of h balls occupies the designated compartment. 
Now, it is possible to form CJ such groups; therefore, the total number of 
favorable cases is given by 

and the required probability will be 




In case n, N and h are large numbers, the direct application of this 
formula becomes -diflSicult, and it is advisable to seek an approximate 
depression for To this end we write the preceding expression thus: 


(j^) / iv-y 

1-2-Z ■ ■ • hy Nj 






where 
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Now, supposing 1 S fc g - 1, we have 


{a) 


('- 1)0 


h-h 


^ _'h ^ hQi - k) ^ ^ h 

) n 


On the other hand, 


k{h - k)S 




and so 

Q>) 


0 - 00 -^) * 0 - 0 - 


The inequalities (a) and (6) give simple lower and upper limits for P. 
For we can write P^ thus: 

p’ = n(* - -^) 

and then apply \a) or (6), which leads to these inequalities 

^ - 4)*"' ^ > (' - 

Correspondingly, we have 

/.^\A 


^ \N/ 

- • - J 

/ \h 


^ (v) 

^ 1 • 2 • 3 • • ■ } 

('-ri'-r- 


Problem 13. What is the probability of obtaining a given sum s of 
points with n dice? 

Solution. The number of all cases for n dice is evidently 6’'. The 
number of favorable cases is the same as the total number of solutions of 
the equation 


(1) 


CkTl 0:2 4" 


+ = S 


where ai, a 2 , * * * an are integers from 1 to 6. This number can be 
determined by means of the following device: Multiplying the polynomial 

(2) X + 

by itself, the product will consist of terms 


^ai+a2 
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where a 1 and 0^2 independently assume all integral values from 1 to 6. 
Collecting terms with the same exponent s, the coelBBicient of will give 
the number of solutions of the equation 


+ 0^2 — 

ai, a 2 being subject to the above mentioned limitations. 

Similarly, multiplying the same polynomial (2) three times in itself 
and collecting terms with the same exponent s, the coefScient of x® will 
give the number of solutions of equation (1) for n = 3. In general, the 
number of solutions of equation (1) for any n is the coelB&cient of x® in 
the expanded polynomial 


(x + + X®)". 

Now we have identically 

X + x^ + # + x^ + X® + X® = 

and by the binomial theorem 


.2 _L /y.4 _L _L ^6 — 


1 — X 


a;«(l _ 

iJo 

00 

(1 - a;)-” = 2) 


Jb = 0 


Jtiplying these series we find the following expression as the 
coe^i^fent of X*: 


8 — n 
6 


X i-iycict:h-i 


1 = 0 

S fl 

where summation extends over integers not exceeding — g — The same 

sum represents the number of favorable cases. Dividing it by 6^, we 
get the following expression for the probability of s points on n dice: 

s — n 
6 




.1 

6Z-1- 


1 = 0 


The preceding problems suffice to illustrate how probability can be 
determined by direct enumeration of cases. For the benefit of students, 
a few simple problems without elaborate solutions are added here. 



COMPUTATION OF PROBABILITIES 25 


Problems for Solution 

1. What is the probability of obtaining 9, 10, 11 points with 3 dice? 

AnS. 2^10, ^J4l6r ^Ki6- 

2. What is the probability of obtaining 2 heads and 2 tails when 4 coins are 

thrown? Ans. %. 

3. Two urns contain respectively 3 white, 7 red, 15 black balls, and 10 white, 

6 red, 9 black balls. One ball is taken from each urn. What is the probability that 
they both will be of the same color? Ans. ^®K25* 

4. What is the probability that of 6 cards taken from a full pack, 3 will be black . 

and 3 red. Ans. = 0.332 approximately. 

6. Ten cards are taken from a full pack. What is the probability of finding 
among them (a) at least one ace; (6) at least two aces? Ans. 

6. The face cards are removed from a full pack. Out of the 40 remaining cards, 
4 are drawn. What is the probability that they belong to different suits? 

Ans. 

7. Under the same conditions, what is the probability that the 4 cards belong to 

different suits and different denominations? Ans. 

8. Five cards are taken from a full pack. Find^ tho probabilities (a) that they are 
of different denominations; (6) that 2 are of the same denomination and 3 scattered; 
(c) that one pair is of one denomination and another pair of a different denomination, 
and one odd; (d) that 3 are of the same denomination and 2 scattered; (e) that 2 are 
of one denomination and 3 of another; (f) that 4 are of one denomination and 1 of 
another. 

Ins. (a) (5) 176^^,35; (c) (d) 8«i 65; (e) Mies; (/) Hies. 

9 . What is the probability that 5 tickets taken in succession in the French lottery 

will present an increasing or decreasing sequence of numbers? Ans. Ho- 

10 . What is the probability that among 5 tickets drawn in the French lottery there 

is at least one with a one-digit number? Ans. = 0.417. 

11. Twelve balls are distributed at random among three boxes. What is the 

55 2^^ 

probability that the first box will contain 3 balls? Ans. ~ 


312 

12. In Prob. 12 (page 22) what is the most probable number of balls in a s. 
box? Ans. The probability 


‘ified 


Vh == 


Cn\N - 1 )”“^ 

N» 


is the greatest if the integer h is determined by the conditions 


n -f 1 
N 


- 1 ^ 


ri + 1 

~ N ' 


13. Apply these considerations to the case of n — 200, N 
h = 10 the inequalities on page 23 give 


20. Ans. Since 


Pio < 


1010 

101 


Pio > 

To find an approximate value of 


IV 20 / V 40/ 

wy, _ iYY, - iV 
10 ! \ 20 / \ 20 / 


(1 - 
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note that 


To 3 decimals 


(1 - = e 


20 2-202 


1 

3-203 


joio = 0 . 128 . 


14. Four different objects, 1, 2, 3, 4, are distributed at random on four places 
marked 1, 2, 3, 4. What is the probability that none of the objects occupies the place 
corresponding to its number? Ans. %. 

J^6. Two urns contain, respectively, 1 black and 2 white balls, and 2 black and 
1 white ball. One ball is transferred from the first urn into the second, after which a 
ball is drawn from the second urn. What is the probability that it is white? 

Ans, %2- 

16. What is the probability of getting 20 points with 6 dice? 

Ans. 0.09047. 

17 . An urn contains a white and b black balls. Balls are drawn one by one until 
only those of the same color are,, left. What is the probability that they are white? 

% ■ Ans.--4-7* 

18 . In an urn there are n groups of p objects each. Objects in different groups are 
distinguished by some characteristic property. What is the probability that among 
ai •+■ as + * • • 4- cin objects (0 ^ ai ^ p; ^ = 1, 2, . . . ?^) taken, there are on of 
one group, a 2 of another, etc.? Ans. Let X among the numbers au 0 : 2 , .. . an be 
equal to a, m be equal to 6, . . . <r be equal to 1. The required probability is 


n\ 




OS'” 


X!ja! 


ff! C“;+« • 


• -botn 


problem 8 is a particular case of this. 

19 . There are N tickets numbered 1, 2, . . . AT of which n are taken at random and 
arranged in increasing order of their numbers: Xi < X 2 < • * • < Xr,, What is the 


probability that Xm — M7 


Ans. 


C% 



CHAPTER II 


THEOREMS OF TOTAL AND COMPOUND PROBABILITY 

1. As the problems become more complex the difficulties in enumerat- 
ing cases grow and often the computation of probabilities by direct 
application of definition becomes very involved. In many cases the 
complications can be avoided by use of two theorems which are funda- 
mental in the theory of probability. 

Before we can give a clear and exact statement of the first fundamental 
theorem, we must define what is meant by mutually exclusive^^ or 
“incompatible^^ events. Events are called mutually exclusive or 
incompatible if the occurrence of one of them precludes the occurrence 
of all the others. For instance, the four events concerning the number 
of points on two dice 

First Die 
1 
2 

3 

4 

are mutually exclusive because it is evident that as soon as one of them 
occurs, none of the others can materialize. 

On the contrary, events are compatible if it is possible for them to 
materialize simultaneously. For instance, the events of 5 points on one 
die and 5 points on the other, are compatible, since in tossing two dice 
it is possible to get 5 points on each. 

To denote the probability of an event A, we shall use the symbol (A). 
To denote the probability of A or B (or both) we shall use the symbol 
(A + B). Dealing with several events A, B, . . . L, the symbol 

( A H- B + * * * + B) 

will denote the probability of the occurrence of at least one of them. 
If A, . . L are mutually exclusive events, this symbol represents 
the probability of the occurrence of one of them without specification as 
to which one. 

2. Now we shall state the first fundamental theorem, called the 
“theorem of total probability^^ or “theorem of addition of probabilities,^^ 
in the following way: 

Theorem of Total Probability. The probability for one of the mutually 
exclusive events A ij A 2 , . . . An to materialize ^ is the sum of the probabilities 

27 ' 


Second Die 
4 
3 
2 
1 
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of these events. In symbolical notations, it is expressed thus: 

(Ai + ^2 + * * * + -Aw) = (^i) + (^2) -[“••*+ {Af). 

Proof. Let N be the number of all possible and equally likely cases 
out of which mi cases are favorable to the event Ai, m2 cases are favorable 
to the event A2, . . . , and finally, cases are favorable to the event An- 
These cases are all different, since events Ai, Az, . - . A^ are incompati- 
ble. The number of cases favorable to either Ax or A2, ... or An is 
therefore 


mi “f” m2 “j“ 


+ mn. 


Hence, by definition 

/ ^ ^ i A \ mi + m2 + * • • + mn mi , m2 , 

(Al + A2 + • • * + An) == TtT = ICr + -^ + 


N 


N ' N 
+ 


+ 


mn 

N' 


Again, by definition of probability, 

m2 




N 


= (A.); 


Mn 

N 


(An), 


and so finally 

(Ai + A2 + * * * + An) = (Ai) + (A2) + * * * + (An), 
as stated. 

3. It is important to know that the same theorem, stated in a slightly 
different form, is especially useful in applications. An event A can 
occur in several mutually exclusive forms, Ai, A 2, . . . An, which may 
be considered as that many mutually exclusive events. Whenever A 
occurs, one of these events must occur, and conversely. Consequently, 
the probability of A is the same as the probability of one (unspecified) 
of its mutually exclusive forms. If, for instance, occurrence of 5 points 
on two dice is A, then this event occurs in 4 mutually exclusive forms, as 
tabulated above. 

From the new point of view, the theorem of total probability can be 
stated thus: 

Second Form, of Theorem of Total Probability. The probability of 
an event A is the sum of the probabilities of its mutually exclusive forms 
Ai, A2, . . . An; or j using symbols j 

(A) == (Ai) + (A 2) + * • ‘ + (An). 

Probabilities (A 1), (A 2), . . . (An) are partial probabilities of incom- 
patible forms of A . Since the probability A is their sum, it may be called 
a total probability of A. Hence the name of the theorem. 
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In the preceding example we saw that 5 points on two dice could be 
obtained in 4 mutually exclusive ways. Now the probability of any one 
of these ways is 3^6 5 hence, by the preceding theorem, the probability 
of obtaining 5 points with two dice is 

A' + bV + A- + = -/b' = 

as it should be. 

If events Ai, A 2 , * An are not only mutually exclusive, but 
“exhaustive,^’ which means that one of them must necessarily take place, 
the probability that one of them will happen is a certainty = 1, so that 
we must have 

(Ai) + (A 2 ) +•••-)- (An) = 1. 

An event which is not certain, may or may not happen; this constitutes 
two mutually exclusive cases. It is customary to call nonoccurrence of a 
certain event A as the “event opposite” to A, and we shall denote it 
by the symbol A. Now A and A constitute two exhaustive and mutually 
exclusive cases. Hence, by the preceding remark 

(A) + (A) = 1. 

That is, if p is the probability of A 

g = 1 - p 

represents the probability that A will not occur. 

4 . If an event A is considered in connection witn another event Bj 
the compound event AB consists in simultaneous occurrence of A and B, 
For three events A, Bj C, the compound event ABC consists in simul- 
taneous occurrence of A and B and C, and so on for any number of 
component events. We shah, denote the probability of a compound 
event AB ... Lhj the symbol 

(AB . . . L). 

An event A can materialize in two mutually exclusive forms, namely, 
as A and B ov A and B. Hence, by the theorem of total probability 

(A) = (A^) + (AJ5). 

Similarly 

(B) ^ (BA) + (BA), 

or, since the symbol (BA) does not depend upon the order of letters, 

(B) = (AB) + (AB). 

The sum (A) + (B) can be expressed as 

(A) + (5) - (AB) + [(AB) + (AB) + {AB)]. 
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Again, by the theorem of total probabilities^ the sum 

{AB) + UB) + {AB) 

represents the probability (A + B) of the occurrence of at least one of 
the events A or B. The preceding equation leads to the useful formula 

(1) (A + 5) = (A) + (B) (AB) 

which obviously is a generalization of the theorem of total probability; 
for (AB) = 0 if A and B are incompatible. Equation (1) can be used to 
derive an important inequality. Since (A + .B) g 1, it follows from (1) 
that 

(AB) ^ (A) + (B) ^ 1. 

If B itself is a compound event AiA^, this inequality leads to 

(AA 1 A 2 ) § (A) + (AiA,) - 1. 

But 

(AiA^) ^ (Ai) + (A 2 ) — 1, 

and so 

(AA1A2) ^ (A) + (Ax) + (A2) - 2 

for three component events. Proceeding in the same manner, we can 
establish the following general inequality: 

(AA 1 A 2 * • * An-i) ^ (A) + (Ai) + (A 2 ) + • • • + (An-i) — (n — 1). 

Applying this inequality to events A, Ali, . . . An-i respectively 
opposite to A, Ai, . . . A^_i, we get 

(All An— 1 ) ^ (A) + (Al) + • • • + (An-l) ~ (n — 1), 

or, since (Ai) === 1 — (Ai), 

(A) + (Ax) • -f- (An-l) ^ 1 — (AAx * * * An-l)* 

Now the compound event AAi . . . An-i means that neither A nor 
Ax, , . . nor An-i occurs. The event opposite to this is that at least 
one of the events A, Ax, . . . An-x occurs. Hence, 

1 — (AAi • * * An-x) == (A + Ax 4 * ' ■ * + An-l), 

and we reach the following important inequality: 

(A + Ax + V * • + An-l) ^ (A) + (Ax) + • • • 4- (A«-i). 

5. Equation (1) can be extended to the case of more than two events. 
Let 5 mean the occurrence of at least one of the events Ax or A 2 . Then 

by(l)-:vrv4' 

(A 4- Al + A2) = (A) + (Ax + A2) - (AB)/ • 


f 
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As to (Ai + A 2 ); its expression is given by (1). The compound event 
AB means the occurrence of one at least of the events AAi or AA^- 
Hence, applying equation (1) once more, we find 

{AB) = (AAi + AA 2 ) = {AAi) + (AA 2 ) - (AA 1 A 2 ) 

and after due substitutions* 

{A + Ai + A2) = (A) + (Ai) + (A2) ~ (AAi) — {AA2) ~ (A1A2) + 

+ {AAiA^, 

Proceeding in the same way and using mathematical induction, the 
following general formula can be established: 

(A + Ai + * • • + An-i) = + ^{AiAjAjo) — • • • 

h3 


where summations refer to all combinations of subscripts taken from 
numbers 0, 1, 2, ... n — 1, one, two, three, .... , and n at a time. 

6. Let A and B be two events whose probabilities are (A) and {B). 
It is understood that the probability (A) is determined' without any 
regard to B when nothing is known dJbout^ the occurrence or nonoccur- 
rence k)f B. When it is known that B occurred, A may have a different 
probability, which we shall denote by the symbol (A, B) and call “con- 
ditional prpbabn^^^^ of given that B has actually happened.” 

Now we can state the second fundamental theorem, called the 
“theorem of compound probability” or “theorem of multiplication of 
probabilities,” as follows: 

Theorem of Compound Probability. The probability of simultaneous 
occurrence of A and B is given by the product of the unconditional probability 
of the event A by the conditional probability of B, supposing that A actually 
occurred. In other words, 

{AB) - (A)* (^, A). 

Proof. Let N denote the total number of equally likely cases among 
which m cases are favorable to the event A. The cases favorable to A 
and B are to be found among the m cases favorable to A. Let their 
number be mi. Then, by the definition of probability, 

{AB) 

which also can be written thus: 


/ ^ 7->\ 

(45) = ^ 


mi 

m 


Now the ratio m/N represents the probability of A . To find the meaning 
of the second factor, we observe that, assuming the occurrence of A, 
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there are only m equally likely cases left (the remaining N — m cases 
becoming impossible) out of which mi are favorable to B, Hence the 
ratio mi/m represents the conditional probability (B, of B supposing 
that A has actually happened. 

Now since 


m 

N 


= (A), 


mi 

m 


= (B, A), 


the probability of the compound event A 5 is expressed by the product 

(AB) = (A) • (B, A). 


Since the compound « event AB involves A and B symmetrically, 
we shall have also 


(AB) = (B) • (A, 5). 

The theorem of compound probability can easily be extended to several 
events. For example, let us consider three events, A, B, C, The occur- 
rence of A and B and C is evidently equivalent to the occurrence of the 
compound event AB and C. We have, therefore. 


(ABC) = (AB) • (C, AB) 

by the theorem of compound probability. By the same theorem 


(AB) = (A) • (B, A), 

so that 

(ABC) = (A) • (B, A) • (C, AB). 

Obviously this formula can be extended to compound events con- 
sisting of more than three components. . 

In one particular but very important case, the expression for the 
compound probability can be simplified; namely, in the case of so-called 
^independent events.” Several events are ^independent” by definition 
if the probability of any one of them is not affected by supplementary 
knowledge concerning the materialization of any number of the remaining 
events. For instance, if A and B represent white balls drawn from 
two different urns, the probability of A is the same whether the color 
of the ball drawn from the other urn is known or not. Similarly, granted 
that a coin is unbiased, heads at the first throw and heads at the second 
throw are independent events. In such theoretical cases the inde- 
pendence of events can be reasonably assumed or agreed upon. In other 
cases, and especially in practical applications, it is not easy to decide 
whether events should be considered as independent or not. 

If A and B are independent, the conditional probability (B, A) is 
the same as the probability (B) found without any reference to A; this 
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follows from the definition of independence. Hence, the expression of 
compound probability (A 5) for two independent events becomes 

{AB) = (A) • (5) 

so that the probability of a compound event with independent com- 
ponents is simply equal to the product of the probabilities of component 
events. This rule extends to any number of component events if they 
are independent. Let us consider three independent events, A, and (7. 
The independence of these events implies 

(5,A) = (5); {C,AB)=={C) 

and hence 

{ABC) = (A) • {B) • (C) 
in accordance with the rule. 

To illustrate the theorem of compound probability, let us consider 
two simple examples. An urn contains 2 white balls and 3 black ones. 
Two balls are drawn, and it is required to find the probability that they 
are both white. Let A be the event consisting in the white color of the 
first ball, and B the event consisting In the ‘white color of the second ball. 
The probability (A) of extracting a white ball in the first place is 

(A) = -h 

To find the conditional probability {B, A) we observe, after drawing one 
white ball, that 1 white and 3 black balls remain in the urn. The 
probability of drawing a white ball under such circumstances is 

{B, A) = J. 

Now, by the theorem of compound probability, we shall have 

Evidently, in this example we dealt with dependent events. 

As an example of independent events, let a coin be tossed any given 
number of times; say, n times. What is the probability of having only 
heads? The compound event in this example consists of n independent 
components; namely, heads at every trial. Now the probability of 
heads in any trial is and so the required probability will be 1/2^. 
Note: Two events A and B are independent by definition, if 
(A, B) == (A) and (5, A) == (B). 

However, one of these conditions follows from the other. Suppose the condition 

(A,^) >= (A) 

is fulfilled, so that A is independent oi B. We have then 

UB) - (B) . (A). 
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On the other hand, 

{AB) 

whence 

(B, = (B), 

so that B is independent of A. 

Three events A, B, C are independent if the following four conditions are fulfilled: 

(A, B) = (A); (A, C) = (A); (B, 0) == (B); (C, AB) = (C). 

From the first three conditions it follows that 

(B, A) = (B); (C, A) - (C); (C, B) - (C). 

To show that the other requirements 

(B, AC) = (B); (A,BC) - (A) 

are also fulfilled, we notice that 

(ABC) = (A) . (B, A) . (C, AB) - (A) • (B) • (C) 

because (C, AB) = (C) by hypothesis and (B, A) — (B) as proved. On the other 
hand, 

(ABC) = (A) . (C, A) • (B, AC) 

and (C, A) = (C) . Hence, comparing with the preceding expression, 

(B, AC) = (B). 

Similarly, it can be shown that 

(A,BC)-(A). 

The independence of four events A, B, C, 2) is assured if the following 11 conditions 
are fulfilled: 

(A, B) - (A, C) « (A, D) = (A); (B, C) = (B, D) - (B); (C, D) - (C); 

(C, AB) - (C); (Z), AB) = (B, AC) = (B, BC) = (B); (B, ABC) = (B). 

And in general, independence of n events is assured if 2” — n — 1 conditions of 
similar type are fulfilled. I 

^If several events are independent, every two of them are independent; but this 
does not suffice for the independence of all events, as can be shown by a simple exam- 
ple. An urn contains foiir tickets with numbers 112, 121, 211, 222, and one ticket is 
drawn. What are the probabilities that the first, second, or third digits in its number 
are 1? Let a unit such as the first, second, or third digit, be represented, respectively 
by A, B, or C. Then 

(A) ^(B) «(C) = t = i- 
Compound probabilities (AB), (AC), (BC) are 

(AB) = (AC) = (BC) - i, 

since among four tickets there is only one whose number has first and second, or 
first and third, or second and third digits of 1. Now, for instance, 


(AB) - i - i: i = (A).(B), 
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whence A and B are independent. Similarly, A and C; C and B are independent. 
Thus, any two of the events A, B, C are independent, but not all three events are. 
For, if they were, we should have 

(ABC) = i 

But (ABC) — 0 since in no ticket are all three digits equal to 1. 

7. The theorems of total and compound probability form the founda- 
tion of the theory of probability as it represents a separate branch of 
mathematical science. They serve the purpose of finding probabilities 
in more complicated cases, either by being directly applied or by enabling 
us to form equations from which the required probabilities can be found. 
A few selected problems will illustrate the various ways of using these 
theorems. 

.^/Problem 14. An urn contains a white balls and h black balls; another 
“"contains c white and d black balls. One ball is transferred from the first 
urn into the second, and then a ball is drawn from the latter. What is 
the probability that it will be a white ball? 

Solution. The event consisting in the white color of the ball drawn 
from the second urn, can materialize under two mutually exclusive forms: 
when the transferred ball is a white one, and when it is black.' By the 
theorem of total probability, we must find the probabilities corresponding 
to these two forms. To find the probability of the first form, we observe 
that it represents a compound event consisting in the white color of the 
transferred ball, combined with the white color of the extracted ball. 
The probability that the transferred ball is white is given by the fraction 

a 

Cb h 

and the probability that the ball removed from the second urn is white, is 

c + 1 
c -f- d -h 1 

because before the drawing there were c + 1 white balls and d black 
balls in the second urn. Hence, by the theorem of compound probability, 
the probability of the first form is 

a[c + 1) 

(a + 5)(c + d + 1) 

In the same way, we find that the probability of the second form is 

he 

(a + 6)(c + d + 1) 

and the sum of these two numbers 

ac + he + a 
(a + 5)(c + d+ 1) 
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gives the probability of extracting a white ball from the second urn, after 
one ball of unknown color has been transferred from the first urn. 

8. Problem 15. Two players agree to play under the following 
conditions: Taking turns, they draw the balls out of an urn containing 
a white balls and b black balls, one ball at a time. He who extracts the 
first white one wins the game. What is the probability that the player 
who starts will win the game? 

♦ Solution. Let A be the player who draws the first ball, and let B 
be the other player. The game can be won by A, first, if he extracts a 
white ball at the start; second, if A and B alternately extract 2 black 
balls and then A draws a white one; third, if A and B alternately extract 
4 black balls and the fifth ball drawn by A is white; and so on. By the 
theorem of total probability, the probability for ^ to win the game, 
is the sum of the probabilities of the mutually exclusive ways (described 
above) in which he can win the game. The probability of extracting a 
white ball at first is 


a 

(2 -4“ 6 

The probability of extracting 2 black balls and then 1 white ball is found 
by direct application of the theorem of compound probabilities. Its 
expression is 

b(b - l)a 

((X “h 6)(u "T 6 — l)(a “f" & — 2) 

The probability of extracting 4 black balls and then 1 white ball is given 
by 

bQ) - l)(h - 2)(6 - 3)a ^ 

(a *4“ 6)(<2 & — l)((x + 6 — 2)(a H- 6 — 3)(a 4~ & — A) 

using the same theorem of compound probability. 

In the same way we deal with all the possible and mutually exclusive 
ways which would allow A to win the game. Then, by adding the above 
given expressions of partial probabilities, we obtain the expression for the 
required probability in the form of the sum 


P == 


a \ 
a + 6j 


1 + 


b{b - 1) 

(a -j- 6 — l)(a -f" 6 — 2) 


+ 


b{b^i){h^-2){b-d) , 

{g A" b l)(a 4“ h — 2) (a 4“ 6 — 3)(a 4” h — 4) 


The law of formation of different terms in this sum is obvious; and 
the sum automatically ends as soon as we arrive at a term which is equal 
to zero. 
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In the same way, we can find that the probability for the player B 
to win is expressed by an analogous sum: 

b , h(h - 1 )(& - 2 ) . . . 

G b — 1 — 3) 

But one of the players, A or B, must win the game, and the winning of 
the game by A and B are opposite events. Hence, 

P + Q = 1 

or, after substituting the above expressions for P and Q and after obvious 
simplifications, 

1 , h{h-l) . . . _ci + b 

a + h-l^{a + h - l){a + b ~ 2) a ' 

This is a noteworthy identity, obtained, as we see, by the principles 
of the theory of probability. Of course, it can be proved in a direct 
way, and it would be a good problem for students to attempt a direct 
proof. There are many cases in which, by means of considerations 
belonging to the theory of probability, several identities or inequalities 
can be established whose direct proof sometimes involves considerable 
diflBculty. 

9. Problem 16. Each of k urns contains n identical balls numbered 
from 1 to n. One ball is drawn from every urn. What is the probability 
that m is the greatest number drawn? 

Solution. Let us denote by Pm the required probability. It is not 
apparent how we can find the explicit expression for this probability, but 
using the theorems of total and compound probability, we can form 
equations which yield the desired expression for P,^ without any difficulty^ 
To this end, let us first find the probability P that the greatest number 
drawn does not exceed m. It is obvious that this may happen in m 
mutually exclusive ways; namely, when the greatest number drawn is 
1, 2, 3, and so on up to m. The probabilities of these different hypotheses 
being Pi, P 2 , . . . Pm, their sum gives the following first expression for 
P: 

(1) ^ P =Pl + P2^ . . . + 

- 

We can find the second expression for P using the theorem of com- 
pound probability; namely, the greatest number drawn does not exceed 
m if balls drawn from all urns have numbers from 1 to m. The proba- 
bility of drawing a ball with the number 1, 2, 3, ... m from any urn is 
m/n. And the probability that this happen for every urn is a 
compound event consisting of k indepmdent events with the same 


Q = 


G b 
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probability m/n. Therefore, by the theorem of compound probability 



And this compared with (1) gives the equation 

/yyih 

( 2 ) P1+P2+ • • • +P«. = 


Substituting w — 1 for m in this equation, we get 


Jpl + ^2 + * ‘ * + P 


(m — 1)^ 


and it suffices to subtract this from (2) to have the required expression for 
Pm: 

-D — ““ 1 )^* 

^ w> ~~T * 

10. Problem 17. Two persons, A and P, have respectively ^ + 1 
and n coins, which they toss simultaneously. What is the probability 
that A will have more heads than P? 

Solution. Let /-t, p! and v' be numbers of heads and tails thrown 
by A and P, respectively, so that ii A- v — n A' 1, A' A- n. The 
required probability P is the probability of the inequality ix > }jf. The 
probability 1 — P of the opposite event ju S is at the same time 
the probability of the inequality v > v'] that is, 1 — P is the probability 
that A will throw more tails than B. By reason of symmetry 1 — P = P, 

% 11. Problem 18. Three players A, P, and C agree to play a series of 
games observing the following rules: two players participate in each game, 
while the third is idle, and the game is to be won by one of them. The 
loser in each game quits, and his place in the next game is taken by the 
player who was idle. The player who succeeds in winning over both 
of his opponents without interruption, wins the whole series of games. 
Supposing that the probability for each player to win a single game is 
and that the first game is played by A and S, find the probability for 
A, By and C, respectively, to win the whole series, if (a) the number of 
games to be played is limited and may not exceed a given number n; 
if (&) the number of games is unlimited. 

Solution. Let P«, Qn, Rn be the probabilities for A, B, and C, respec- 
tively, to win a series of games when their number cannot exceed n. By 
reason of symmetry, Pn = Qn so that it remains to find P,^ and Pn- 
The player A can win the whole series of games in two mutually exclusive 
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ways : if he wins the first game, or if he loses the first game. Let the 
probability of the first case be pw and that of the second Tn. Then 

Pn ~ Pn “h ^71. 


A can win the whole series after winning the first game, in two mutually 
exclusive ways: (a) if he wins over B and C in succession; (6) if he wins 
the first game from B and loses the second game to C ; then, if in the third 
game C loses to B, and in the fourth game A wins over B and later wins 
the whole series of not more than n — 3 games. Now, the probability 
of case (a) is H M M by the theorem of compound probability; 
that of case (6) by the same theorem is }^ipn~z) and the total probability is 

(1) Vn — \ A- iPn-3- 


If A loses the first game to B, but wins the whole series, then in the 
second game C wins over 5 while the third game is won by A, and not 
more than n — 2 games are left to play. Hence, 

( 2 ) Tn ~ ^pn-~2* 


4 


1 + H + 


8 


substitutions yields 

Pzk 
Pzk+1 
Pzk+2 

or, in condensed form for an arbitrary n 


II 

equation (1) 

1 . 

1 \ 


• • • 

1 

1 \ 

Qgi 1 

4- 

• ■ • 

. A 

82 ^ 

• • • + ^) 


Pn — * '^(1 . S 



denoting by [x] the greatest integer contained in x. Hence, by virtue of 
(2) the general expression of will be 






and that of P„, Qn 


_r^i 

= ,5, _ ,.4,8 L 3 J 


P n — Qn — A® 


A8 




Finally, to find the probability for C to win, we observe that this can 
happen only if C wins the second game; hence, 
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Since < 1, the difference 

1 - Pn - - P. = *8 L 3 J + -3^8 + *8 L 3 J 

represents the probability of a tie in n games. This probability decreases 
rapidly when n increases, so that in a long series of games a tie is prac- 
tically impossible. If the number of games is not limited, the proba- 
bilities P, Q, P for A, B, C, respectively, to win are obtained as limits of 
P». Qn, Rn, when n increases indefinitely. Thus 

P = Q=^, B = -^. 


Problems for Solution 

1. Three urns contain respectively 1 white and 2 black balls; 3 white and 1 black 

ball; 2 white and 3 black balls. One ball is taken from each urn. What is the proba- 
bility that among the balls drawn there are 2 white and 1 black? Ans. 

2. Cards are drawn one by one from a full deck. What is the probability that 

10 cards will precede the first ace? Ans. — 0.03938. 

3. Urn 1 contains 10 white and 3 black balls; urn 2 contains 3 white and 5 black 
balls. Two balls are transferred from No. 1 and placed in No. 2 and then one ball is 
taken from the latter. What is the probability that it is a white ball? .dns. ’®% 3 o. 

4. Two urns identical in appearance contain respectively 3 white and 2 black balls; 

2 white and 5 black balls. One urn is selected and a ball taken from it. What is the 
probability that this ball is white? Ans. ^3^o* 

6. What is the probability that 5 tickets drawn in the French lottery all have one- 
digit numbers? Ans. 44162 6 = 29. 10“^ 

6. What is the probability that each of the four players in a bridge game will get a 

(1 • 2 • • • 13)^ 

complete suit of cards? Ans. 24-- — — = 4.474. 

1 . j ... 52 

7. What is the probability that at least one of the players in a bridge game will 
get a complete suit of cards? 

, 16 • 13! • 39! - 72 - (131)2 • 26! + 72 • (13!)^ 

Ans. — = 2.52 • 10" 

' ! 

See Sec. 5, page 31. 

8. From an urn with a white and b black balls n balls are taken. Find the prob- 
ability of drawing at least one white ball. Ans. The required probability can be 
expressed in two ways. First expression: 

h(b -T) » » • (6 - + 1) 

{cl b^ (P' b — 1) • • • (a -j- 6 — -j- 1) 

Second expression: 
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9. Three players A, B,C in turn draw balls from an urn with 10 white and 10 black 

balls, taking one ball at a time. He who extracts the first white ball wins the game. 
Supposing that they start in the order Ay By <7, find the probabilities for each of them 
to win the game. Ana. For Ay 0.56584; for B, 0.29144; for C, 0.14271. 

10. If n dice are thrown at a time, what is the probability of having each of the 
points 1, 2, ... 6, appear at least once? Find the numerical value of this prob- 
abihty for n = 10. Jins. 


2>n = 1 - 6(1)- + 15(1-)- - 20(1)- + 15(1)- - 6 • (D- 
pio = 0.2718. 

Hint: Use the formula in Sec. 5, page 31. 

11. In a lottery m tickets are drawn at a time out of the total number of n tickets, 
and returned before the next drawing is made. What is the probability that in h 
drawings each of the numbers 1, 2, ... n will appear at least once? Ans. 



n(n ~ 1)/ n — mV/ n - m — iV 

1*2 \ n / \ n — l / 


12. We have k varieties of objects, each variety consisting of the same number of 
objects. These objects are drawn one at a time and replaced before the next drawing. 
Find the probability that n and no less drawings will be required to produce objects of 
all varieties. Ans. 


k^~'^Pn = (A — l)-“i 






1 • 2 


(k - - 


13. Three urns contain respectively 1 white, 2 black balls; 2 white, 1 black balls; 

2 white, 2 black balls. One ball is transferred from the first urn into the second; then 
one from the latter is transferred into the third; finally, one ball is drawn from the 
third urn. What is the probability of its being white? Ans. ^J^o- 

14. Each of n urns contains a white and b black balls. One ball is transferred 
from the first urn into the second, then one ball from the latter into the third, and so 
on. Finally, one ball is taken from the last urn. What is the probability of its being 
white? Atis. Denote by pk the probability of drawing a white ball from the kth. urn. 
Then 


Pk+i ~ 


a + 1 


,Vk + 




(1 “ Pk) 


for fc = 1, 2, . . . n — 1. Hence, 


a 


16. Two players A and B toss two dice, A starting the game. The game is won 
by A if he casts 6 points before B casts 7 points; and it is won by B if he casts 7 points 
before A casts 6 points. What are the probabilities for A and B to win the game H 
they agree to cast dice not more than n times? What is the probability of a tie? 
Ans. Probability for A : 

Vn « |?[1 - (iff)-] if 71 = 2m 

Pn = nil - if 71 = 2m 4- 1. 


Probability for B: 


Qn - nil - (iff)-] 
qn = |«1 - (Ml)-] 


if 

if 


n — 2m 
n = 2m 4 1. 
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Probability of a tie : 

Tn = (Ml)”" if W = 2m) Tn = fKMf)”" if n = 2m 4- 1. 

If n increases indefinitely, Vn converges to 0 and converge to the limits 

V ^ I = l-b 

which may be considered as the probabilities for A and B to win if the number of 
throws is unlimited. 

16. The game known as '4raps^’ is played with two dice, and the caster wins 
unconditionally if he produces 7 or 11 points (which are called “naturals”); he loses 
the game in case of 2, 3, or 12 points (called “craps”). Biit if he produces 4, 5, 6, 8, 9, 
or 10 points, he has the right to cast the dice steadily until he throws the same num- 
ber of points he had before or until he throws a 7. If he rolls 7 before obtaining his 
point, he loses the game; otherwise, he wins. What is the probability to wdn? 

Ans. 24^i9 5 = 0.493. 

17- Prove directly the identity in Prob. 15, page 37. 

Solution 1. Let 


^ b b(b - 1) b(b - l)(b 2) 

c c(c — 1) c(c — l)(c — 2) 


4 . . . 


where 6 is a positive integer and c > b. Then 

<p{c, 6) = -[1 + ¥>(c - 1, 6 - 1)1 
c 

whence 


12 3 

v?(c, 1) = ~; <p(c, 2) = <p(c, 3) = 

c c — 1 c — 2 


and in general 


<p(c, b) 


Taking c = a 4 — 1, we have 

1 4 ^(a 4 5 - 1, b) 
Solution 2. The polynomial 


c - 5 4 1 

a A" b 


S(x) = 1 + -X + ^ + . . . 
c c(c 1) 

can be presented in the form of a definite integral 

Six) = (C + l)£(l - f(l - *))H1 - ^)^d^ 

whence ' 

S(l) = (c + 1 ) f (1 - 

JO c — 6 4 1 a 

if c = a 4 5 — 1. 

18. Find the approximate expressions for the probabilities F and Q in Prob. 15, 
page 36, when b is a large number. Take for numerical application a = 5 = 50. 
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Solution. Since P + Q = 1, it suffices to seek the approximate expression for 
P - Q. Now 


whence 


P - Q = af^[l - 2J)»(1 - iY-Hl 


To ffiid the approximate expression of this integral, we set 
(1 - 

whence u can be expressed as a power series in vi 




2 46 + a - 1 1262 + (26 -f a - 1)® , 

u = V H — . . . . 

26+a-l (26+ a -1)3 ^ 3(26 + a - 1)^ 

Substituting the resulting expression of du/dv and integrating . with respect to v 
between limits 0 and oo , we obtain for P — Q an asymptotic expansion whose first 
terms are 




a 

26 + a - 1 


1 


46 + g - 1 g[1262 + (26 + g - l)^] (-1)^ 

(26 + g - 1)2] (26 + g - 1)5 


A more detailed discussion reveals that the error of this approximate formula is less 

1 j .X, «W0(g - 1)2 - 66(a - 1) + 3262] , 

than a(J^)^+i(^£)““-i and greater than ; provided 

(26 +0^ ~ 1)5 


6 ^ 12. For g = 6 = 50 the formula yields 


P - Q ^ 0.3318; P = 0.6659; Q - 0.3341. 

\ 
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CHAPTER III 
REPEATED TRIALS 

1. In the theory of probability the word ^^triaF^ means an attempt to 
produce, in a manner precisely described, an event E which is not certain. 
The outcome of a trial is called a success’’ if E occurs, and a '^failure” if 
E fails to occur. Por instance, if E represents the drawing of two cards 
of the same denomination from a full pack of cards, the ^^trial” consists 
in taking any two cards from the full pack, and we have a success or 
failure in this trial according to whether both cards are of the same 
denomination or not. 

If trials can be repeated, they form a “series” of trials. Regarding 
series of trials, the following two problems naturally arise: 

a. What is the probability of a given number of successes in a given 
series of trials? And as a generalization of this problem: 

h. What is the probability that the number of successes will be 
contained between two given limits in a given series of trials? 

Problems of this kind are among the most important in the theory of 
probability. 

2. Trials are said to be “independent” in regard to an event E if 
the probability of this event in any trial remains the same, whether 
the results of any number of other trials are known or not. On the other 
hand, trials are “dependent” if the probability of E in a certain trial 
varies according to the information we have about the outcome of one or 
more of the other trials. 

As an example of independent trials, imagine that several times in 
succession we draw one ball from an urn containing white and black balls 
in given proportion, after each trial returning the ball that has been 
drawn, and thoroughly mixing the balls before proceeding to the next 
trial. With respect to the color of the balls taken, we may reasonably 
assume that these trials are independent. On the other hand, if the 
bails already extracted are not returned to the urn, the above described 
trials are no longer independent. To illustrate, suppose that the urn 
from which the balls are drawn, originally contained 2 white and 3 black 
balls, and that 4 balls are drawn. What is the probability that the 
third ball is white? If nothing is known about the color of the three 
other balls, the probability is %. If we know that the first ball is white, 
but the colors of the second and fourth balls are unknown, iihis proba- 
bility is 34- general, the probability for any ball to be white (or black) 
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depends essentially on the amount of information we possess about the 
color of the other balls. Since the urn contains a limited number of 
balls, series of trials of this kind cannot be continued indefinitely. 

As an example of an indefinite series of dependent trials, suppose that 
we have two urns, the first containing 1 white and 2 black balls, and the 
second, 1 black and 2 white balls, and the trials consist in taking one 
ball at a time from either urn, observing the following rules: (a) the 
first ball is taken from the first urn; (h) after a white ball, the next is 
taken from the first urn; after a black one, the next is taken from the 
second urn; (c) balls are returned to the same urns from which they were 
taken. 

Following these rules, we evidently have a definite series of trials, 
which can be extended indefinitely, and these trials are dependent. 
For if we know that a certain ball was white or black, the probability 
of the next ball being white is or %, respectively. 

Assuming the independence of trials, the probability of an event E 
may remain constant or may vary from one trial to another. If an 
unbiased coin is tossed several times, we have a series of independent 
trials each with the same probability, 3^, for heads. It is easy to give 
an example of a series of independent trials with variable probability for 
the same event. Imagine, for instance, that we have an unlimited 
number of urns with white and black balls, but that the proportion of 
white and black balls varies from urn to urn. One ball is drawn suc- 
cessively from each of these urns. Evidently, here we have a series of 
trials independent in regard to the white color of the ball drawn, but 
with the probability of drawing a ball of this color varying from trial to 
trial. 

In this chapter we shall discuss the simplest case of series of inde- 
pendent trials with constant probability. They are often called “Ber- 
noullian series of trials’^ in honor of Jacob Bernoulli who, in his classical 
book, ^'Ars conjectandi^^ (1713) made a profound study of such series 
and was led to the discovery of one of the most important theorems in 
the theory of probability. 

3. Considering a series of n independent trials in which the probability 
of an event is p in every trial (that of the opposite event F being 
g = 1 — p), the first problem which presents itself is to find the proba- 
bility that E will occur exactly m times, where m is one of the numbers 
0, 1, 2, . . . n. In what follows, we shall denote this probability by 
In the extreme cases m — n and m = 0 it is easy to find Tn and To. 
When m = n, the event E must occur n times in succession, so that Tn 
represents the probability of the compound event EEE . . . E with n 
identical components. These components are independent events, since 
the trials are independent, and the probability of each of theip is p. 
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Hence, the compound probability is 

Tn — V * p ' V ' ' ' V {n times) 


or 


Tn = P". 


The symbol To denotes the probability that E will never occur in n 
trials, which is the same as to say that F will occur n times in succession. 
Hence, for the same reasons as before, 

To = 3™ = (1 - V)X^ 

When m is neither 0 nor n, the event^emisisting in m occurrences of E 
can materialize in several mutually>exclusive forms, each of which may 
be represented by a definite succps^ion of m letters E and n — m letters F, 
For example, if n = 4 and 2, we can distinguish the following mutu- 
ally exclusive forms correp^fonding to two occurrences of E: 

EEFF, 'EFEF, EFFE, FEEF, FEFE, FFEE, 

To find the number of all the different successions consisting of m 
letters n — m letters F, we observe that any such succession is 

del^i^rlnined as soon as we know the places occupied by the letter E, 
fow the number of ways to select m places out of the total number of 
n places is evidently the number of combinations out of n objects taken 
w at a time. Hence, the number of mutually exclusive ways to have 
m successes in n trials is 


n{n — • 1) 


(n ~ m + 1) 


1 • 2 • 3 


m 


The probability of eaich succession of m letters E and n — m letters F, 
by reason of independence of trials, is represented by the product of 
m factors p and n — m factors q, and since the product does not depend 
upon the order of factors, this probability will be 


mrtn—m 


p^q 

for each succession. Hence, the total probability of m successes in n 
trials is given by this simple formula: 


( 1 ) 


T„. 




1 *2 * 3 *• * M 

which can also be presented thus: 


V T 


( 2 ) 


n\ 


m!(n — m)lP ^ 
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This second form can be used even for m = 0 or m = n if, as usual, 
we assume 0! = 1. Either of the expressions (1) or (2) shows that 
may be considered as the coefficient of in the expansion of 

(q + pty 

according to ascending powers of an arbitrary variable t. In other 
words, we have identically 

(q + pty = To + Tit + T^t^ + • * • + 

For this reason the function 

{q + pty 

is called the ^'generating function^^ of probabilities To, Ti, 2 ^ 2 , •• • 

By setting i — 1 we naturally obtain 

To + Ti + T2 + • • • + n - 1. 

The probability P(fc, 1) that the number of successes m will satisfy 
the inequalities (or, simply, the probability of these inequalities) 

k ^ m S I 

where k and I are two given integers, can easily be found by distinguishing 
the following mutually exclusive events: 

m ^ k or m = fc + 1, . . . or m I, 

Accordingly, by the theorem of total probability, 

P(k, 1) = T,+ ^ + Ti 

or, using expression (2), 

i ' 

m = k 

In particular, the probability that the number of successes will not 
be greater than I is represented by the sum 

P(0, 0=3" + + . . . + 

n(n-l) ■ ■ ■ {n-l + 1) , , 

+ 1-2 • • • Z • 


Similarly, the probability that the number of successes in n trials wdll 
not be less than I can be presented thus: 
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P(Z, n) = 


n{n — 1) 


{n 


1-2 




1 + !LZJ2 + 


+ 


(n - l){n - I - l)/p 
(l+l)il + 2) \q, 


+ 


where the series in. the brackets ends by itself. 

5. The application of the above established formulas to numerical 
examples does not present any difficulty so long as the numbers with 
which we have to deal are not large. 


Example 1. In tossing 10 coins, what is the probability of having exactly 5 heads? 
Tossing 10 different coins at once is the same thing as tossing one coin 10 times, if all 
the coins are unbiased, which is assumed. Hence, the required probability is given 
by formula (1), where we must take n = 10, w = 5, and it is 


10 • 9 • 8 • 7 • 6 1 

1 •2*3 -4 -5 *210 


252 

10^ 


- 0.24609. 


'Example 2. If a person playing a certain game can win $1 with the probability 
14} and lose twenty-five cents with the probability %, what is the probabihty of win- 
ning at least 13 in 20 games? Let m be the number of times the game is won. The 
total gain (considering a loss as a negative gain) will be 

m — i(20 — m) — §m — 5 dollars 

and the condition of the problem requires that it should not be less than $3. Hence 

Jw — 5 ^ 3, 

whence m ^ or, since m is an integer, m ^ 7. That is, in 20 trials an event with 
the probability ]4 must happen at least 7 times and the probability for that is: 



This sum contains 14 terms; but it can be expressed through another sum containing - 
only 7 terms, because 


20 


20! 


^m!(20 - w)I\3 


= 1 


20! 


iml(20 - m)!\3^ 


m=7 m==0 

Using the last expression, one easily gets 0.5207 for the required probability. 

6. In the series of probabilities 

To, Tt, r2, . . . Tn 


for 0, 1, 2, . , . n successes in n trials, the terms generally increase till 
the greatest term Tn is reached, and then they steadily decrease. For 
instance, if n = 10, p = g = the values of the expression 


for m == 0, 1, 2, . . . 10 are 



Sec. 6] 


REPEATED TRIALS 


49 


1, 10, 45, 120, 210, 262, 210, 120, 45, 10, 1 
so that Tg is the greatest term. For obvious reasons the number jx (to 
which the greatest term Ty. in the series of probabilities To, Ti, . . . Ty 
corresponds) is called the “most probable” number of successes. 

To prove this observation in general, and to find the rule for obtaining 
ix, we observe first that the quotient 

T m+l 




I'm -f- 1 j 

decreases with increasing m, so that 
(a) 


To 


^ Ti Ti 


> 


T„ 

Tn-l 


The two extreme terms in (a) are 


h 

To 


np 


Tn ^ P_ 
Tn -1 nq 


and if n is large enough, the first of them is > 1 and the last <1. To 
find exactly how large n must be, we notice that 


if 

whence 

Similarly, 

if 

whence 


To 


np > q = 1 — p 


n+l>-- 

V 


Ty 

Ty-l 


< 1 


p < nq or 1 — q < nq 


n + l>-- 
Q 


Consequently, if n + 1 is greater than both 1/p and 1/q, the first term 
in (a) is > 1 and the last term is < 1. As the terms of (a) form a decreas- 
ing sequence, there must be a last term which is ^1. Let it be 

Ty 


T. 


y-l 


To ^ T, ^ 


T 

^ X y 


T 


^ 1 


y-l 


Then 
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and 



^At4-2 


> • * > 


Tn 

Tn-l 


or, which is the same, 

n < < ^2 < • • • < ^ T, 

Tfi > Tpt-f-i > Tfj.-^2 ^ * * ’ ^Tn- 

In other words, the sequence of probabilities increases till the greatest 
term T^, is reached and steadily decreases from then on. Besides 
there may be another greatest term namely, when = Ty.; 

but all the other terms are certainly less than Ty, The number is 
perfectly determined by the conditions 

Ty _n - 11 + Ip ^ Ty+i _ n - }ip ^ . 

Ty-i H g = Ty fi + lq^ ^ 


which are equivalent to the two inequalities 


(n + l)p ^ flip + g), np - q < flip + g). 
These in turn can be presented thus: 


fi ^ in + l)p < p + 1 

and show that p is uniquely determined as the greatest integer contained in 
in + l)p. If in + l)p is an integer, then p — in + l)p and Ty = Ty^i. 
That is, there are two greatest terms if, and only if, in + l)p is an 
integer. 

Let us consider now what happens if 

^+1^- or n + 1 

V ~ q 


In the first case, all the terms in (a) are less than 1 with the single excep- 
tion of the first term Ti/Tq which may be equal to 1; namely, when 

n + 1 — Consequently, 

To^ Tl> T2> • — > Tn 


so that To is the greatest term. If (n + l)p < 1 the greatest integer 
contained in in + l)p is 0, and there is only one greatest term To. If, 
however, (n + l)p = 1, there are two terms To = Ti greater than 
others. 

If (n + l)g ^ 1, all the terms in series (a) are > 1 with the exception 
of the last term, which may be equal to 1; namely, when (n + l)g = 1. 
Hence, 


To < Ti < . . . < Tn-l ^ Tn 
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so that Tn is the greatest term, and the preceding term Tn- 
to it only if (n + 1)^ = 1. Now the condition 

{n + l)q ^ 1 

is equivalent to 


(n + l)p ^ n. 

On the other hand, because p < 1, 

(n + l)p < n + 1. 


1 


can be equal 


Therefore n is the greatest integer contained in (n + l)p. 

Comparing the results obtained in the last two cases (excluded at 
first) with the general rule, we see that in all cases the greatest term 
Tf, corresponds to 

At = [(?^ + l)p]. 

If (n + l)p is an integer, then there are two greatest terms and 
This rule for determining the most probable number of successes is very 
simple and easy of application to numerical examples. 


Example 1. Let n = 20, p ^ Then (n + l)p = 8.4, and the greatest 

integer contained in this number is m = 8. Hence, there is only one most probable 
number of successes = 8 with the corresponding probabihty 


Ts 


8!12!\5/ \5/ 


0.1797. 


Example 2. Let w = 110, p = q - %, and (n + l)p = 37, an integer. 
Consequently, 36 and 37 are the most probable numbers of successes with the corre- 
sponding probability 


Tu 


= T37 


37173 iVs/ W 


0.0801. 


7. When n, m, and n — m are large numbers, the evaluation of 
probability Tm by the exact formula 


71^ 

ml(n — m)l^ ^ 

becomes impracticable and it is necessary to resort to approximations. 
For approximate evaluation of large factorials we possess precious means 
in the famous ^'Stirling formula.^^ Referring the reader to Appendix I 
where this formula is established, we shall use it here in the following 
form: 

log x! = log\/2TX + X log X — X + co(x) 

12(x + I) < “^^) < 


where 
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In the same appendix the following double inequality is proved : 

“ 121 ^ ^ 12n + 6 “ 12m + 6 “ 

1 

121 + 6 ' 


Now from Stirling’s formula 

n! == 's/'l'Kn 


and two similar expressions for m\ and {n — w)! follow. Substituting 
them into !Fm, we get two limits 


(3) 

(4) 

where 


T m ^ 
Tra > I 


4 

4 


n 


2Trm{n — m) 
n 

2Tm(n — m) 


(npY/ nq V" 
\m / \n — m) 

\ m / \n — m) 


1 1 1 

^ = gl2n4-6 12m+6 12(n--m)4-6 

_1 1 1 

1 gl2n 12m 12 (n — m)^ 


When n, m, n m are even moderately large k and I differ little from 
each other. 

Inequalities (3) and (4) then give very close upper and lower limits 
for Tm- To evaluate powers 



with large exponents, sufficiently extensive logarithmic tables must be 
available. If such tables are lacking, then in cases which ordinarily 
occur when ratios np/m and nqf{n — m) are close to 1, we can use 
special short tables to evaluate logarithms of these ratios or else resort to 
series. 

8. Another problem requiring the probability that the number of 
successes will be contained between two given limits is much more 
complex in case the number of trials as well as the difference between 
given limits is a large number. Ordinarily for approximate evaluation 
of probability under such circumstances simple and convenient formulas 
are used. These formulas are derived in Chap. VII. Less known is 
the ingenious use by Markoff of continued fractions for that purpose. 

It suffices to devise a method for approximate evaluation of the 
probability that the number of successes will be greater than a given 
integer I which can be supposed >np. We shall denote this probability by 
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P(Z), A similar notation Q{T) will be used to denote the probability 
that the number of failures is >Z where again Z > nq. The probability 
P(k, 1) of the inequalities k m Si can be expressed as follows: 

P(Zc, Z) = 1 -- P(Z) - Q{n - k) 


iil > np and h < np; 

P(k, Z) = P(k - 1) - P(Z) 
if both k and Z are > np; and finally 

P(k, Z) = Q(n - Z - 1 ) - Q(n - k) 

if both k and Z are < np. 

For P(Z) we have the expression 


P(l) = 


ni 


(Z + l)!(n~-Z 


l)f ^ 


1 + 


n — I 


1 + 2 


{n -- I — l){n — Z — 2)(p 
(Z + 2)(Z + 3) 


2 

f I + 


The first factor 


n\ 


- /v) — 1 

(Z + l)!(n - I - l)r ^ 


can be approximately evaluated by the method of the preceding section. 
The whole difficulty resides in the evaluation of the sum 


>S = 1 -|~ 


Z -- 1 ^ (n — Z — l)(n — Z 2)( p 


Z + 2 ^ ‘ (i + 2)(Z + 3) 

which is a particular case of the hypergeometric series 


+ 


r(», :») - 1 + ^ + + 


l-T 


1 • 2y{y + 1 ) 


In fact 


< 


pi — n ~]r I 1, 1, Z “h 2 




Now, owing to this connection between S and hypergeometric series, /S 
can be represented in the form of a continued fraction. First, it is 
easy to establish the following relations: 

F{a, ^ + 1, r + 1, a:) = F{<x, /3, 7 ,a:) + 

+ ^^^^Fia + l,0 + l,y + 2,x) 

F(a + 1, y + I, x) = F(a, y,x) + 

0(y — a) 


+ X 


7(7 + 1) 


F{a + 1, ;8 + 1, 7 + 2, x). 
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Substituting a + n, ^ y + 2n and a + n, ^ + n + Ij y + 2n + 1, 
respectively, for a, /3, y in these relations and setting 

^271 = F{a + n, n, y 2n, x)] 

F(a + n, + n + l,y + 2n + 1, x) 

_ (|8 + n){'Y — a; + n) _ (a + n){y — P + n) 

(7 + 2ft)(r + 2ra - 1)’ (7 + 2 k)( 7 + 2n + 1) 

for brevity, we have 

Jlo = Xi — CI1XX2 
Xi = X2 — ci^xXz 


whence 


Xtu—I = - drn^X 7714 - 1 


dm—lX 

~T~ 


In our particular case 

Xi = F{ — n + Z -h 1, 1, Z ”1- 2, x)j Xq = 1 
and d^n—u—i ~ 0. 

Hence, takiiig ^ and introducing new notations, we have a 

finite continued fraction 


On — Z— 1 
I dn—l—1 


where 

i'fi\ - (^ - ^ ^ fc(n + h)p 

ya) cj - _ \){l + 2h)q’ {I + 2h){l + 2k + l)g' 

Every one of the numbers Ci will be positive and < 1 if this is true for 
Cl. Now 

OLziziJ)? < 1 

il + 2)q 

if Z > np, and that is exactly what we suppose. The above continued 
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fraction can be used to obtain approximate values of S in excess or in 
defect, as we please. Let us denote the continued fraction 


Cfc 

1 



Cfc-j-i 


by ojfc. Then 


0 < COfc < Cfc, 


which can be easily verified. Furthermore, 



0)1 


Cl 

1 + 



= £2 
1 + 



and in general 


Ck 

CO. = T + 


dk 


0)k+l 


Having selected k, depending on the degree of approximation we 
desire in the final result (but never too large; ^ = 5 or less generally 
suffices) . we use the inequality 


0 < 0)^+1 < Ck+i 


to obtain two limits in defect and in excess for w*. Using these limits, we 
obtain similar limits for cck-i, o)k- 2 j ook^z, . . . and, finally, for coi and S. 

The series of operations will be better illustrated by an example. 

9. Let us find approximately the probability that in 9,000 trials an 
event with the probability p — will occur not more than 3,090 times 
and not less than 2,910 times. To this end we must first seek the 
probability of more than 3,090 occurrences, which involves, in the first 
place, the evaluation of 


T 


3091 


9000! 

309115909! \z) \3/ 


By using inequalities (3) and (4) of Sec. 7, we find 
0.011286 < Tzon < 0.011287. 


Next we turn to the continued fraction to evaluate the sum S. The 
following table gives approximate values of Cl, C 2 , . . . Ceanddi, ^2 . . . ds 
to 5 decimals and less than the exact numbers 
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n 

Cn 

dn 

1 

0.95553 

0.00047 

2 

0.95444 

0.00094 

3 

0.95335 

0.00140 

4 

0.95227 

0.00187 

5 

0.96119 

0.00234 

6 

0.95010 



We start with the inequalities 

0 < C 06 < 0.95011 
and then proceed as follows: 

1.00234 < 1 + -A_ < 1.04711; 0.90839 < < 0.94898 

1 — C 06 

1.02041 < 1 + -■ -- < 1.03685; ^ 0.91842 < coi < 0.93324 

1 — C05 

1.01716 < 1 + < 1-02113; 0.93362 < C 03 < 0.93728 

1 — 0)4 

1.01416 < 1 + _A_ < 1.01514; 0.94020 < co^ < 0.94113 

1 — 0)3 

1.00785 <1 + < 1.00816; 0.94779 < wi < 0.94810 

1 — 0)2 

1 ^ o ^ 1 

0.05221 ^ ^ 0.05190 

0.02161 < STzan < 0.02175. 

Hence, we know for certain that 

0.02161 < P(3,090) < 0.02175. 

By a similar calculation it was found that 

0.02129 < Q(6,090) < 0.02142, 

so that 

0.04290 < P(3,090) + Q(6,090) < 0.04317. 

The required probability P that the number of successes will be contained 
between 2,910 and 3,090 (limits included) lies between 0.95683 and 
0.95710 so that, taking P = 0.9570, the error in absolute value will be 
less than 1,7 X 10”^. 

Problems for Solution 

1. What is the probability of having 12 three times in 100 tosses of 2 dice? 

C?oo(^)«(||)97 = 0.2257. 
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2. What is the probability for an event E to occur at least once, or twice, or three 
times, in a series of n independent trials with the probability p? ilns. 


(c) 1 - (1 - p)- 


1 + (?^ ~ 2)p + 


(n — l)(n ~ 2) 


-p^ 


3. What is the probability of having 12 points with 2 dice at least three times in 

100 throws? Ans. 0.528. 

4. In a series of 100 independent trials with the probability }i, what is the most 
probable number of successes and its probability? Ans. m = 33; T^z — 0.0844. 

Note: Log 100! = 157.97000; Log 671 = 94.56195; Log 331 = 36.93869. 

6. A player wins $1 if he throws heads two times in succession; otherwise he loses 
25 cents. If this game is repeated 100 times, what is the probability that neither his 
gain nor loss will exceed $1? Or $5? Ans. 


Q>) 


1001 

201801 


(a) 


1001 

201801 



- 0.0493; 



80 80-79 80-79-78 80-79-78-77 
63 63 • 66 63 - 66 • 69 63 • 66 - 69 • 72 


60 60-57 60-57-54 60 - 57 - 54 - 51 
81 81 • 82 81 - 82 • 83 81 - 82 • 83 • 84 


= 0.4506 


NpTE: Log 201 = 18.38612; Log 80! = 118.85473. 

Show that in a series of 2s trials with the probability the most probable num- 
ber of successes is 5 and the corresponding probability 


Show also that 


Hint: 


Ts = 


Ts < 


Ts < 


1 • 3 - 5 - • - (2s - 1) 


2-4-6 

1 


2s 


\/2s + 1 

2 • 4 • 6 • • • 2s 
3 • 5 ■ 7 • ■ • (2s + 1)' 


7. Prove the following theorem : If P and P' are probabilities of the most probable 
number of successes, respectively, in n and n -t 1 trials, then P' ^ P, the equality 
sign being excluded unless (?^ + l)p is an integer. 

8. Show that the probability Tn corresponding to the most probable number of 
successes in n trials, is asymptotic to (27mpg)“H, that is, 


lim T’m'v/ 27r7ip^ = 1 as n — > oo . 

9. When p = }i, the following inequality holds for every m; 
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10. What is the probability of 215 successes in 1,000 trials Up = 

Ans. 0.0154. 

11. What is the probability that in 2,000 trials the number of successes will be 

con-^ned between 460 and 540 (limits included) if 2 ? = Aws. 0.964. 

Two players A and B agree to play until one of them wins a certain number of 
games, the probabilities for A and B to win a single game being p and g = 1 — p. 
However, they are forced to quit when A has a games still to win, and B has h games. 
How should they divide their total stake to be fair? 

This problem is known as “probleme de parties,” one of the first problems on 
probability discussed and solved by Fermat and Pascal in their correspondence. 

Solution 1. Let P denote the probability that A will win a remaining games before 
B can win h games, and let Q = 1 — F denote the probability for B to win b games 
before A wins a games. To be fair, the players must divide their common stake M in 
the ratio P:Q and leave the sum MP to A and the sum MQ to B. 

To find P, notice that A wins in the following mutually exclusive ways: 

a. If he wins in exactly a games; probability p^. 

a 

b. If he wins in exactly a -j- 1 games; probability 


c. If he wins in exactly a ■}- 2 games; probability 


a(a + 1) 


n. If he wins in exactly a + 5 — 1 games; probability 


u(a "hi) * * * {u b — 2) 
1*2*3 • • • (5 -1) 


. ^aqb- 


Consequently 

p := pa 

and similarly 

Q 


a a(a + 1) , 


+ • 


+ 


a(a -f- 1) • • * (a + 6 — 2) 


1 -2 - 




. . ^ + 1) o 


+••••+ 


Hb 4- 1) 


(&-1) 


(6 + a - 2) 


1 *2 


(a ~ 1) 


.pa- 


Show directly that P 4- 0 = 1. 

Hint: 4” — — 0. 

dp dp 

Solution 2. The same problem can be solved in a different way. Whether A ox B 
wins will be decided in not more than a 4- b — 1 games. Now if the players continue 
to play until the number of games reaches the limit ct 4" & — 1, the number of games 
won by A must be not less than a. And conversely, if this number is not less than a, A 
will win a games before B wins 6 games. Therefore, P is the probability that in 
Q 5—1 games A wins not less than a times, or 


P = 


(a 4- h - 1)1 
- 1 ) 


01 


1 4 . tiLl 2 4 - 


a 4" 1 2 (o 4* l)(ct + 2) 
jb - 1)(6 • 

(a4-l)(a+2) 

Show directly that both expressions for P are identical. 



(a 4-h - 1)' 



Hint: Proceed as before. 

13. Prove the identity 
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n n{n — 1) „ „ , . n(n — 1) • * • (n — A; + 1) 

4 _ ^ " 77 ^ + • • • + % 


J^^^n-A~l(X _ xYdx 
— x)^dx 


H^int : Take derivatives with respect to p. 

14. A and B have, respectively, n 1 and n coins. If they toss their coins 
simultaneously, what is the probability that (a) A will have more heads than P? 
(Jb) A and B will have an equal number of heads? (c) B will have more heads than A ? 

Solution, a. Let Pn be the probability for A to have more heads than B, This 
probability can be expressed as the double sum 

n -|-1 71 

a: = 1 a = 0 

Considering the coefficient of P in 


(1 + 


we have 


Hence 




(1 + 1)^ 


2 riot+x/^cc 


x = 0 


n + 1 


22n 1 

fin+x _ 

2»+l 22n+l 2 


h. The probability Qn for A and B to have an equal number of heads is 

1 




pn 

^2n4-l 




c. The probability Rn for B to have more heads than A is 

p _ -*■ ^2n+l 

2 22»1+1 ' 


15. If each of n independent trials can result in one of the m incompatible events 
Eij E^t . . . Em with the respective probabilities 

Ply P2, . . . Pm; (Pl + P2 + * • • 4- Pm = 1), 
show that the probability to have h events Eij h events E 2 y . . . Im events Em where 
Zi -f ^3 + • • • + is given by 
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PROBABILITIES OF HYPOTHESES AND BAYES’ THEOREM 

1. The nature of the problems with which we deal in this chapter may 
be illustrated by the following simple example: Urns 1 and 2 contain, 
respectively, 2 white and 3 black balls, and 4 white and 1 black balls. 
One of the urns is selected at random and one ball is drawn. It happens 
to be white. What is the probability that it came from the first urn? 
Before the ball was drawn and its color revealed, the probability that the 
first urn would be chosen had been 1/2; but the indication of the color 
of the ball that was drawn altered this probability. To find this new 
probability, the following artifice can be used: 

Imagine that balls from both urns are put together in a third urn. 
To distinguish their origin, balls from the first urn are marked with 1 
and those from the second urn are marked with 2. Since there are 5 
balls marked with 1 and the same number marked with 2, in taking one 
ball from the third urn we have equal chances to take one coming from 
either the first or the second urn, and the situation is exactly the same 
as if we chose one of the urns at random and drew one ball from it. 
If the ball drawn from the third urn happens to be white, this can happen 
in 2 + 4 = 6 equally likely cases. Only in 2 of these cases will the 
extracted ball have the mark 1. Hence, the probability that the white 
ball came from the first urn is % = 3^^. 

The success of this artifice depends on the equality of the number of 
balls in both urns. It can be applied to the case of an unequal number 
of balls in the urns, but with some modifications; however, it seems 
preferable to follow a regular method for solving problems like the 
preceding one. 

2, The problem just solved is a particular case of the following funda- 
mental: 

Problem 1. An event A can occur only if one of the set of exhaustive 
and incompatible events 

Bij ... Bn 

occurs. The probabilities of these events 

(Rx), {B,), . . . {Bn) 

corresponding to the total absence of any knowledge as to the occurrence 

60 
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or nonoGcurrence of A, are known. Known also^ are the conditional 
probabilities 

(A, Bi)] i = 1, 2, . . . n 

for A to occur, assuming the occurrence of Bi. How does the proba- 
bility of Bi change with the additional information that A has actually 
happened? 

Solution. The question amounts to finding the conditional proba- 
bility {Bi, A). The probability of the compound event ABi can be 
presented in two forms 

{AB^) = {Bi){A, Bi) 
or 

{AB,) = {A){Bi, A). 

Equating the right-hand members, we derive the following expression 
for the unknown probability {Bi, A): 

/D 

{Bi, A) - 

Since the event A can materialize in the mutually exclusive forms 


ABi, AB<2,, . . . ABnj 

by applying the theorem of total probability, we get 

{A) = {Bi){A, Bi) + {B,){A, S 2 ) + • • • + {BrXA, B^). 

It suffices now to introduce this expression into the preceding formula for 
{Bi, A) to get the final expression 

(1) (B’ A) = (^i){A, Bi) 

/ ' ^ ^ 2 ) + • • * + {Bn){A, Bn) 

'^This formula, when described in words, constitutes the so-called 
Bayes' theorem." However, it is hardly necessary to describe its 
content in words; symbols speak better for themselves. For that 
reason, we prefer to speak of Bayes’ formula rather than of Bayes' 
theorem. Bayes' formula is also known as the ■ ^formula for probabilities 
of hypotheses." (The reason for that name is that the events ^ 2 , . . , 
Bn may be considered as hypotheses to account for the occurrence of JL.) 
It is customary to speak of probabilities 

{Bf), {Bf), . . , {Bn) 
as a g mri probabilities of hypotheses 

. JBi, . Bn, 

while probabilities 

{Bi, A)\ i = 1, 2, . . . n 
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are called a posteriori probabilities of the same hypotheses. 

3. A few examples will help us to understand the meaning and the 
use of Bayes^ formula. 


Example 1. The contents of urns 1, 2, 3, are as follows: 

,1 white, 2 black, 3 red balls 
2 white, 1 black, 1 red balls 
4 white, 5 black, 3 red balls 

One urn is chosen at random and two balls drawn. They happen to be ’white and red. 
"What is the probabihty that they came from urn 2 or 3 ? 

Solution. The event A represents the fact that two balls taken from the selected 
urn were of white and red color, respectively. To account for this fact, we have three 
hypotheses: The selected um was 1 or 2 or 3. We shall represent these hypotheses in 
the order indicated by Bi, B 2 f Bz. Since nothing distinguishes the uims, the probabili- 
ties of these hypotheses before anything was known about A are ^ 

(Bi) = {B2) = (Ba) = |. 

^ The probabilities of A, assuming these hypotheses, are 

(A, B^) = h (A, B 2 ) = h (A, Bz) - 


It remains now to introduce these values into formula (1) to have a posteriori prob- 
abilities 


(B 2 , A) - 
{Bz, A) = 


i J i • 


i * i + i * '3' “h i ' 


55 
118 
^ 30 
A 118 


• A 


and also, naturally. 


\Ai 


(Si, A) = 1 - (B„ A) - (,B„ A) = 


Example 2. It is known that an urn containing altogether 10 balls was filled in 
the following manner: A coin was tossed 10 times, and according as it showed heads 
or tails, one white or one black ball was put into the urn. Balls are drawn from this 
urn one at a time, 10 times in succession (always being returned before the next draw- 
ing) and every one turns out to be white. What is the probability that the urn con- 
tains nothing but white balls? 

Solution. The event A consists in the fact that in 10 independent trials with a 
definite but unknown probability, only white balls appear. To account for this fact, 
we have 10 hypotheses regarding the number of white balls in the urn; namely, that 
this number is either 1, or 2, or 3, . . . or 10. The a priori probability of the hypo- 
thesis Bi that there are exactly i white balls in the urn, according to the manner in 
which the urn was filled, is the same as the probability of having i heads in 10 throws; 
that is, 

*!(10 - * = 1 , 2 , . . . 10 . 

Granted the hypothesis Bt, the probability of A is 


I 
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The problem requires us to find (Bio, A), The expression of this probability immedi- 
ately results from Bayes' formula: 


(Bio, A) — 



The denominator of this fraction is 


Hence • 


14.247. 

(Bio, A) = 0.0702. 


This probability, although still small, is much greater than Ho 24 > the a priori prob- 
ability of having o^ly white balls in the urn. 

If, instead of lO drawings, m drawings have been made and at each drawing white 
balls appeared, th^ probabihty (Bio, A) would be given by 


(Bio, A) 


1 

10 


The denominator of this formula can be presented thus: 


Now 


and so 


Hence 


10 

24-foT 

i — 0 


(‘-s) 


10 10 

/ . \ w mi / rn\ 10 

i~0 i—0 



1+e ' 


(Bio, A) >\l+e 

This shows that with increasing m the probability (Bio, A) rapidly approaches 1. 
For instance, if w == 100 

(Bio, A) > (1 + > (1.0000454)-io > 0.99954. 


Thus, after 100 drawings producing only white balls, it is almost certain that the 
urn contains nothing but white balls — a conclusion which mere common sense would 
dictate. 

Example 3. Two urns, 1 and 2, contain respectively 2 white and 1 black ball, 
and 1 white and 5 black balls. One ball is transferred from urn 1 to urn 2 and then- 
one ball is drawn from the latter. It happens to be white. What is the probability 
that the transferred ball was black? 
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Solution. Here we have two hypotheses: Bi^ that the transferred ball was black, 
and Bo, that it was white. The a priori probabilities of these hypotheses are 

{B^) - h (B.) - f 

The probabilities of drawing a white ball from urn 2, granted that Bi or B 2 is true, 
are: 

(A, B^) = f, (A, B 2 ) = r 

The probability of Bi, after a white ball has been drawn from the second urn, 
results from Bayes’ formula: 


(Bi, A) 


+ 1 


1 

5 * 


4. Problem 2. Retaining the notations, conditions, and data of 
Prob. 1, find the probability of materialization of another event C 
granted that A has actually occurred. Conditional probabilities 

(C, ABi); t = 1, 2, . . . n 


are supposed to be known. 

Solution. Since the fact of the occurrence of A involves that of one, 
and only one, of the events 

5i, R 2 , . . . Bn, 


the event C (granted the occurrence of A) can materialize in the following 
mutually exclusive forms 

CBi, CB2, . . . CBn. 

Consequently, the probability (C, A) which we are seeking is given by 
(C, A) = {CB^, A) + (CR 2 , A) + V • • + {CBn, A). 

Applying the theorem of compound probability, we have 
{CBi, A) = (Hi, A)(C, BiA) 

and 

(C, A) = (Bi, A)(C, ABO + (B 2 , A)(C, AB^) + * * • + 

(B., A)(C, AB.). 

It sufiices now to substitute for 

(Bi,A) 

its expression given by Bayes' formula, to find the final expression 

X iBdiA, B,)iC, AB,) 

X(B,)(A,B,) 


( 2 ) 
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It may happen that the materializatiGii of hypothesis Bi makes C 
independent of A ; then we have simply 

(C, ABi) = (C, Bi) 

and instead of formula (2), we have a simplified formula 


2 B.)(C, 5,) 

(3) (C, A) = = 

The event C can be considered in regard to A as Si future event. For 
that reason formulas (2) and (3) express probabilities of future events. 
For better understanding of these commonly used technical terms, we 
shall consider a simple example. 

• 

^^xample 4. From an urn containing 3 white and 5 black balls, 4 balls are trans- 
fferred into an empty urn. From this urn 2 balls are taken and they both happen to 
be white. What is the probability that the third ball taken from the same urn, will 
‘ be white? 

Solution, (a) Let us suppose that the two balls drawn in the first place are returned 
to the second urn. Analyzing this problem, we distinguish first the following hypoth- 
eses concerning colors of the 4 balls transferred from the first urn. Among them, there 
are necessarily 2 white balls. Hence, there are only two possible hypotheses : 

J5i: 2 white and 2 black balls; 

B-zi S white and 1 black ball. 


A priori probabilities of these hypotheses are 


n2 /nf2 
/■D \ _ ^3 ■ ^5 

^ 

Cl • Cl 


(B,) = 


Ct 


3 

7' 

Ja 


The event A consists in the white color of both balls drawn from the second urn 
The conditional probabilities (A, Bi) and {A, BI) are 

(A, Bi) = i; (A, BI) = f. 

The future event C consists in the white color of the third ball. Since the 2 balls 
drawn at first are returned, C becomes independent of A as soon as it is known which 
one of the hypotheses has materialized. Hence 

(C, ABi) = (C, Bi) = I 
(C, ABI) == (C, B 2 ) = f. 

Substituting these various numbers in formula (3), we find that 
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(b) If the two balls drawn in the first place are not returned, we have 

(C, ABi)= 0, iC,AB,)=l- 


Then, making use of formula (2), 

(C, A) 


T¥ • I • I 
f 


1 

6 ' 


6. The following problem can easily be solved by direct application 
of Bayes’ formula. 

Problem 3. A series of trials is performed, which, with certain 
additional data, would appear as independent trials in regard to an event 
E with a constant probability p. 

Lacking these data, all we know is that the unknown probability p 
must be one of the numbers 


Ph P2, - . . pk 

and we can assume these values with the respective probabilities 

aiy ^2, . . . ah. 

In n trials the event E actually occurred m times. What is the proba- 
bility that p lies between the two given limits a and ^ (0 ^ a < 13 ^ 1), 
or else, what is the probability of the following inequalities : 

a ^ p g /3? 

A particular case may illustrate the meaning of this problem. In a 
set of N urns, Nai urns have white balls in proportion pi to the total 
number of balls; Na^ urns have white balls in proportion p 2 ; . . . Nak 
urns have white balls in proportion pk. An urn is chosen at randonf and 
n drawings of one ball at a time are performed, the ball being returned 
each time before the next drawing so as to keep a constant proportion 
of white balls. It is found that altogether m white balls have appeared. 
What is the probability that one of the Nai urns with the proportion 
Pi of white balls was chosen? Evidently this is a particular case of the 
general problem, and here we possess knowledge of the necessary data, 
provided that the probability of selecting any one of the urns is the same. 

Solution. We distinguish k exhaustive and mutually exclusive 
hypotheses that the unknown probability is pi, or p 2 , . . . or pk. The 
a priori probabilities of these hypotheses are, respectively, ai, a 2 , . . . ak. 
Assuming the hypothesis p = p^, the probability of the event E occurring 
m times in n trials is 

CjpT(l - piY-^, 

Now, after E has actually happened m times in n trials, the a pos- 
teriori probability of the hypothesis p = p^, by virtue of Bayes’ formula, 
will be 
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k 

- ViY-’” 

or, canceling 

aip^il - 
k 

2a:<p?(l - 

4 = 1 

Now, applying the theorem of total probability, the probability P of the 
inequalities 


will be given by 


Oi ^ p ^ 


^ _ i:>aip^{i - piY^-^ 
^ k 

- Pi)’*-™ 


where the summation in the numerator refers to all values of pi lying 
between a and |d, limits included. 

An important particular case arises when the set of hypothetical 
probabilities is 

Pi = P2 = * * • Pft = 1 

and the a priori probabilities of these hypotheses are equal: 

1 

ai = a^ = * * ' = a* = 7* 


Then the fraction 1/k can be canceled in both numerator and denomina- 
tor. The final formula for the probability of the inequalities 


a S p S & 

will be 

(5) p = 2pT(l - pQ"-™ 

4 = 1 

summation in numerator being extended over all positive integers i 
satisfying the inequalities 

koi ^ i S 

In the limit, when k tends to infinity, the a priori probability of the 
inequalities 

a ^ p ^ P 
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iJi" 

i 


I 


[fi;: 


i.Vi 

il- 


ls given simply by the length — a oi the interval (a, /5). The a pos- 
teriori probability of the same inequalities is obtained as the limit of 
expression (5). Now, as oo^ the sums 






i^koc 


1 

k 


tend to the definite integrals 
— xY^^dx 


and 


and 


2 ( 0 ’ 


i == 1 


x^(l — xY ^dx. 


Therefore, in the limit, the a posteriori probability of the inequalities 

a S V S ^ 

is expressed by the ratio of two definite integrals 

J^x^(l - xY~^dx 


( 6 ) 


P = 


xY~'^dx 


This formula leads to the following conclusion: When the unknown 
probability p of an event E may have any value between 0 and 1 and the a 
priori probability of its being contained between li7nits a and p is ^ — a, 
then after n trials in which E occurred m times, the a posteriori probability 
of p being contained between a and P is given by formula (6). 

6. Problem 4. Assumptions and data being the same as in Prob. 3, 
find the probability that in ni trials, following n trials, which produced 
E m times, the same event will occur mi times. 

Solution. It suffices to take in formula (3) 


and 


(Bi) = ap, {A, Bi) = C^pril - 
(C, Bi) = 


to find for the required probability this expression: 


(7) 

Supposing again 


_ p.) 

Q = 




t = 1 


1 2 

Pl = l’ P2 = ■ ■ ■ Pk = 1 

1 
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and letting k 


( 8 ) 


=0 formula (7) in the limit becomes 


Q - C;; 


f 




(1 — 


— xy-^dx 


This formula leads to the following conclusion: When the unknown 
probability p of an event E may have any value between limits 0 and 1 
and the a priori probability of its being contained between a and /5 is 
P — a (so that equal probabilities correspond to intervals of equal length), 
the probability that the event E will happen mi times in ni trials following 
n trials which produced E m times is given by formula (8). 

In particular, for ni = mi = 1 (evaluating integrals by the known 
formula), we have 


Q = 


m + 1 

n + 2 


This is the much disputed ^Taw of succession’’ established by Laplace. 

7. Bayes’ formula, and other conclusions derived from it, are neces- 
sary consequences of fundamental concepts and theorems of the theory of 
probability. Once we admit these fundamentals, we must admit Bayes’ 
formula and all that follows from it. 

But the question arises: When may the various results established 
in this chapter be legitimately applied? In general, they may be applied 
whenever all the conditions of their validity are fulfilled; and in some 
artificial theoretical problems like those considered in this chapter, they 
unquestionably are legitimately applied. But in the case of practical 
applications it is not easy to make sure that all the conditions of validity 
are fulfilled, though there are some practical problems in which the use 
of Bayes’ formula is perfectly legitimate.^ In the history of probability 
it has happened that even the most illustrious men, like Laplace and 
Poisson, went farther than they were entitled to go and made free use 
principally of formulas (6) and (8) in various important practical prob- 
lems. Against the indiscriminate use of these formulas sharp objections 
have been raised by a number of authors, especially in modern times. 

The first objection is of a general nature and hits the very existence 
of a priori probabilities. If an urn is given to us and we know only that 
it contains white and black balls, it is evident that no means are available 
to estimate a priori probabilities of various hypotheses as to the propor- 
tion of white balls. Hence, critics say, a priori probabilities do not exist 
at all, and it is futile to attempt to apply Bayes’ formula to an urn with 
an unknown proportion of balls. At first this objection may appear 

^ One such problem can be found in an excellent book by Thornton C. Fry, “Prob- 
ability and Its Engineering Uses,^^ New York, 1928. 
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very convincing, but its force is somewhat lessened by considering the 
peculiar mode of existence of mathematical objects. 

Some property of integers, unknown to me, is not present in my 
mind, but it is hardly permissible to say that it does not exist; for it does 
exist in the minds of those who discover this property and know how to 
prove it. 

Similarly, our urn might have been filled by some person, or selected 
from among urns with known contents. To this person the a priori 
probabilities of various proportions of white and black balls might 
have been known. To us they are unknown, but this should not prevent 
us from attributing to them some potential mode of existence at least as 
a sort of belief. 

To admit a belief in the existence of certain unknown numbers is 
common to all sciences where mathematical analysis is applied to the 
world of reality. If we are allow^ed to introduce the element of belief 
into such ^^exact^^ sciences as astronomy and physics, it would be only 
fair to admit it in practical applications of probability. 

The second and very serious objection is directed against the use of 
formula (6), and for similar reasons against formula (8). Imagine, 
again, that we are provided with an urn containing an enormous number 
of white and black balls in completely unknown proportion. Our aim 
is to find the probability that the proportion of white balls to the total 
number of balls is contained between two given limits. To that end, we 
make a long series of trials as described in Prob. 5 and find that actually 
in n trials, white balls appeared m times. The probability we seek would 
result from Bayes’ formula, provided numerical values of a priori proba- 
bilities, assumed on belief to be existent, were known. Lacking such 
knowledge, an arbitrary assumption is made, namely, that all the a 
priori probabilities have the same value. Then, on account of the 
enormous number of balls in our urn, formula (6) can be used as an 
approximate expression of P. It can be shown that, given an arbitrary 
positive number e, however small, the probability of the inequalities 


e < p < he 

n ^ n 


can be made as near to 1 as we please by taking the number of trials 
greater than a certain number Nie) depending upon e alone. In other 
words, with practical certainty we can expect the proportion of white 
balls to the total number of balls in our urn to be contained within 
arbitrarily narrow limits 

and he. 

n 


m 

n 


€ 
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j A conclusion like this would certainly be of the greatest importance. 

I ’ . But it is vitiated by the arbitrary assumption made at the beginning. 

;; The same is true of formula (8) and of Laplace’s ^^law of succession.” 

j The objection against using formulas (6) and (8) in circumstances where 

we are not entitled to use them appears to us as irrefutable, and the 
numerical applications made by Laplace and others cannot inspire much 
confidence. 

As an example of the extremes to which the illegitimate use of formulas 
f (6) and (8) may lead, we quote from Laplace: 

En faisant, par exemple, remonter la plus ancienne 6poque de Thistoire k 
cinq mille ans, ou 4 1,826,213 jours, et le Soleil s’4tant lev4 constamment, dans 
^ cet intervalle, 4 chaque revolution de vingt-quatre heures, il y a 1,826,214 4 parier 

contre un qu’il se levera encore demain. 

It appears strange that as great a man as Laplace could make such a 
statement in earnest. However, under proper conditions, it would 
not be so objectionable. If, from the enormous number + 1 of 
urns containing each N black and white balls in all possible proportions, 
one urn is taken and 1,826,213 balls are drawn and returned, and they 
f all turn out to be white, then nobody can deny that there are very nearly 

^ 1,826,214 chances against one that the next ball will also be white. 

f Problems for Solution 

1 1. Three urns of the same appearance have the following proportions of white and 

j blackballs: 

I Urn 1: 1 white, 2 black balls 

I Urn 2: 2 white, 1 black ball 

Urn 3: 2 white, 2 black balls 

One of the urns is selected and one ball is drawn. It turns out to be white. What 
is the probability that the third urn was chosen? Ans. 

2. Under the same conditions, what is the probability of drawing a white ball 
? again, the first one not having been returned? Ans. 

1; 3. An urn containing 5 balls has been filled up by taking 5 balls from another urn, 

which originally had 5 white and 5 black balls. A ball is taken from the first urn, and 
it happens to be black. Wliat is the probability of drawing a white ball from among 
the remaining 4? Ans. %. 

!; 4. From an urn containing 5 white and 5 black balls, 5 balls are transferred into an 

empty second urn. From there, 3 balls are transferred into an empty third urn and, 
finally, one ball is drawn from the latter. It turns out to be white. What is the 
probability that all 5 balls transferred from the first urn are white? Ans. 4426- 

6. Conditions and notations being the same as in Prob. 3 (page 66), show that the 
probability for an event to occur in the (n + l)st trial, granted that it has occurred 
' in all the preceding n trials, is never less than the probability for the same event to 

occur in the nth trial, granting that it has occurred in the preceding n — 1 trials. 

Hint: it must be proved that 

h k j h 
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For that purpose, use Cauchy’s inequality 


( k \ 2 k h 

g • ^7,-. 

/ i==l i = l 

6. Assuming that the unknown probability p of an event E can have any value 
between 0 and 1 and that the a priori probability of its being contained in the interval 
(a, jS) is equal to the length of this interval, prove the following theorem: The prob- 
ability a posteriori of the inequality 

p ^ cr 


after E has occurred m times in n trials is equal to the probability of at least m 4- 1 
successes in n -j- 1 independent trials with constant probability cr. (See Prob. 13, 
page 59.) 

7. Assumptions being the same as in the preceding problem, find approximately 
the probability a posteriori of the inequalities 

^ P ^ Uh 

it being known that in 200 trials an event with the probability p has occurred 105 
times. Arts. Using the preceding problem and applying Markoff’s method, we find 
P = 0.846. 

8 . An urn contains N white and black balls in unknown proportion. The number 
of white balls hypothetically may be 

0, 1, 2, ... iV 


and all these hypotheses are considered as equally likely. Altogether n balls are 
taken from the urn, m of which turned out to be white. Without returning these 
balls, a new group of ni balls is taken, and it is required to find the probability that 
among them there are nti white balls. Naturally, the total number of balls is so 
large as to have n A- < N. Ans. The required probability has the same expression 

0 , 

I — x)^~^dx 

as in Prob. 4, page 69- 


Polynomials ordinarily called ‘‘Hermite’s polynomials,” although they were dis- 
covered by Laplace, are defined by 


The first four of them are 

Hi(y) = - 2 /; 7 / 2 ( 2 /) = 2 /® - 1; = -y^ + dy; Hi{y) = y* - 6y‘ + 3. 

They possess the remarkable property of orthogonality: 


r. 


e ^ Hm,{y)IIn{y)dy — 0 when m 9^ % 


f ^HniyYdy = 's/^irnl 


while 



PROBABILITIES OF HYPOTHESES AND BAYES’ THEOREM 73 


Under very general conditions, a function f(y) defined in the interval ( — “ , + " ) 
can be represented by a series 


fiy) = ao 4- aiHiiy) -f a^E^iy) + 


where in general 


Let 


1 


dk = 


= c 2 
2irJ_ «, 


and = 


f(y)H,(y)dy. 


n a(l “ a) 

provided 0 < a < 1. 

9. Prove the validity of the following expansion indicated by Ch. Jordan: 


(n + Dl 

m!(n — m) f 


.^m(i _ 


h 


VS’t 


1 - 20 ! 

1 ~hHAy) + 

ri + 2 

2n - (lln + 6)q:(1 - ol) 


h^.iy) + 


271(71 -}- 2) (?2p -f- 3) 
for 0 ^ 2 : ^ 1 where 2 / is a new variable connected to x by the equation 

x = a + T- 
h 

Hint: Consider the development in a series of Hermite’s polynomials of the 
function 


m 


= — a — for — /la S h,{l ~ a) 

f{y) =0 if either y < —ha or y > h{l — a). 


10. Assuming that the conditions of validity of formula (6) are fulfilled, show that 
the a posteriori probability of the inequalities 


m 

I 

n 


a(l — a) 


<P < 


m fi 

i" ' 

n \ 


a(l — a) 


m 

a — — 
n 


can be expanded into a convergent series 


p = 


f 


\/^Jo 


't 

e ^ dy — 


te ^2n- (lln + 6)a(l - a) 
{71 +2){n 4- 3)a(l — a) 


4- • • 


When n is large and a is not near either to 0 nor to 1, two terms of this series suffice 
to give a good approximation to P (Ch. Jordan). Apply this to Prob. 7. 

Atis. 0.84585. 
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CHAPTER V 


USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 

OF PROBABILITY 

1. The combined use of the theorems of total and compound proba- 
bility very often leads to an equation in finite differences which, together 
with the initial conditions supplied by a problem itself, serves to deter- 
mine an unknown probability. This method of attack is very powerful, 
and it is often resorted to, especially in the more difficult cases. In this 
chapter the use of equations in finite differences, applied to a few selected 
and comparatively easy examples, will be shown; but in Chap. VIII 
we shall apply the method to a class of interesting and historically 
important problems. 

Certain preliminary explanations are necessary at this point. Again 
we consider a series of trials resulting in an event E or its opposite, F, 
but this time we suppose that the trials are dependent, so that the 
probability of jE 7 at a certain trial may vary according to the available 
information concerning the results of some of the other trials. 

A simple and interesting case of dependent trials arises if we suppose 
that the probability of E in the (n + l)st trial receives a definite value 
a if E has happened in the preceding nth trial, and this value does not 
change whatever further information we may possess concerning the 
results of trials preceding the nth. Also, the probability of E in the 
(n + l)st trial receives another determined value ^ if E failed in 
the nth trial, no matter what happened in the trials preceding the nth. 

We have a simple illustration of this kind of dependence, if we suppose 
that drawings are made from an urn containing black and white balls in 
a known proportion, and that each ball drawn is returned to the urn, but 
only after the next drawing has been made. It is obvious that the proba- 
bility that the (n + l)st ball drawn will be white, becomes perfectly 
definite if we know what was the color of the ball immediately preceding, 
and it remains the same no matter what we know about the colors of the 
1, 2, . . . {n — l)st balls. 

If the trials depend on each other in the above-defined manner, we 
say that they constitute a ^^simple chain, to use the terminology of the 
late A. A. Markoff, who was the first to make a profound study of 
dependent trials of this and similar, but more complicated, types. It is 
implied in the definition of a simple chain that it breaks into two sepa- 
rate parts as soon as the result of a certain trial becomes known. For 

74 ■ 
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instance, if the result of the fifth trial is known, trials 6, 7, 8, . . . become 
independent of trials 1, 2, 3, 4, and the chain breaks into two distinct 
parts: the trials preceding the fifth, and those following it. If the 
results of trials 1, 2, 3, . , . (n — 1) remain unknown, the event E 
in the following nth trial has a certain probability which we shall denote 
by pn. Also, if it becomes known that E happened at trial k, where 
A < n — 1, the probability of E happening in the nth trial receives a 
different value, . It is important to find means to determine the 
probability pn, the a priori probability of E in the nth trial when the 
results of the preceding trials remain unknown; as well as to determine 
the probability p^^^ of E in the nth trial when we possess the positive 
information that E has materialized in the kth(k < n — 1) trial. 

2. Thus we are led to the following problem concerning simple chains 
of dependent trials: 

Problem 1. The initial probability pi of the event E in a simple 
chain of trials being known, find the probability pn of E in the nth trial 
when the results of the preceding trials remain completely unknown. 
Also, find the probability p^^^ of E in the nth trial when it is known that 
E has happened in the kth trial where fc < n — 1. 

Solution. In the nth trial the event E can happen either preceded 
by E in the (n — l)st trial, the probability of which is pn-i, or preceded 
by F in the (n — l)st trial, the probability of which is 1 — * p^-i. By 
the theorem of compound probability, the probability of the succession 
EE is pn-ioi, while the probability of the succession FE is (1 — pn-i)i^. 
Hence, the total probability pn is 

(1) Pn = apn~l + /3(1 - Pn-l) = {a - 0)pn-l + d. 

This is an ordinary equation in finite differences. It has a particular 
solution 


Pn = c = const. 

where c is determined by the equation 


c = (a — 

whence 

^ 1 + P — a 

provided 1 + ^ — a 9 ^ OA On the other hand, the corresponding 

^ If 1 4- |S — Q! — 0 or a — jS = 1, we necessarily have a = 1, = 0, whicb. 

means that E must oocuv in all the trials if it actually occurs in the first trial, and 
never occurs if it does not actually occur at the outset. This case, as well as the other 
extreme case in which a — ^ == — 1 can therefore be excluded as not possessing real 
interest. 
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homogeneous equation 

yn = {a - 

has a general solution 

y„ = C(a — 

involving an arbitrary constant C. Adding to it the previously found 
particular solution, we obtain the general solution of (1) in the form 


Pn = Cia — 1 


+ 


1 + ^- a 


The arbitrary constant C is determined by the initial condition 


so that finally 


C + 


1 + /3 - a 


= Pi 


If 

= r +| - - ^ 

we see that pn does not depend on n and is constantly equal to pu Be- 
cause we may exclude the cases « — /? = ! or a — (3 — —1, so that 
a — is contained between —1 and 1, we may conclude from the above 
expression that pn, if not a constant, at any rate tends to the limit 

1 + /? - a 

as n increases indefinitely. 

As to we find in a similar way that it satisfies the equation 
(2) - ap^l, + ^(1 - pi^l,) 

of the same form as equation (1). But the initial condition in this 
case is = a because the probability of E happening in the {h + l)st 
trial is a when it is known that E occurred in the preceding trial. The 
solution of (2) satisfying this initial condition is 




H-/3 


+ 


1 - 


1 +|3 - a 


(a — iS)”- 


As the second term in the right-hand member decreases with increas- 
ing ri and finally becomes less than any given number, we see that the 
positive information concerning the result of the kth trial has less and less 
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influence on the probability of E in the following trials, and in remote 
trials this influence becomes quite insignificant. 

Example. An urn contains a white and b black balls, and a series of drawings of 
one ball at a time is made, the ball removed being returned to the urn immediately 
after the taking of the next following ball. What is the probability that the nth ball 
drawn is white when: (a) nothing is known about the preceding drawings; (6) the ^th 
ball drawn is white? 

a — 1 a ct 

In this particular problem we have a ~ — — jS = — — -» pi — — —7 

a +6 — 1 a + 6 — 1 a + 0 

and 


Thus 


1 4 - ^ « 


a 

a 4“ 6 


= pi. 


Vn 


Vl 


a 

d b 


That is, the probability for any ball drawn to be white is the same as that for the 
first ball, nothing being known about the results of the previous drawings. The 
expression for is, in this example, 




a b 




(a + 6) (a 4- 6 - 1)"^" 


So, for instance, if a = 1, 6 = 2, n = 5 , A; = 3 , 


3 ^ 3 • 22 


2' 


the information that the third ball was white raises to the probability that the fifth 
ball will be white; it would be without such information. 


3. The next problem chosen to illustrate the use of difference equa- 
tions is ■ interesting in several respects. It was first propounded and 
solved by de Moivre. 

Problem 2. In a series of independent trials, an event E has the 
constant probability p. If, in this series, E occurs at least r times in 
succession, we say that there is a run of r successes. What is the proba- 
bility of having a run of r successes in n trials, where naturally n > r? 

Solution. Let us denote by yn the unknown probability of a run of 
r in n trials. In n + 1 trials the probability of a run of r will then be 

Pn+i. Now, a run of r in n 4- 1 trials can happen in two mutually 

exclusive ways: first, if there is a run of r in the first n trials, and second, 
if such a run can be obtained only in n 4- 1 trials. The probability of 
the first hypothesis is 2/n. To find the probability of the second hypothe- 
sis, we observe that it requires the simultaneous realization of the follow- 
ing conditions : 

(a) There is no run of r in the first n — r trials, the probability of 

which is 1 “ (&) In the (n — r + l)st trial, E does not occur, 



78 


INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. V 


the probability of which is g — 1 — p. (c) Finally, E occurs in the 
remaining r trials, the probability of which is p’’. 

As (a), (6), (c) are independent events, their simultaneous mate- 
rialization has the probability 

(1 - yn^)qp^> 

At the same time, this is the probability of the second hypothesis. 
Adding it to yn, we must obtain the total probability yn-^i- Thus 

(3) yn+l = + (1 ” yn~r)p''q 

and this is an ordinary linear difference equation of the order r + 1- 
Together with the obvious initial conditions 

2/0 = yi = • • • = yr-l = 0 , yr = P" 

it serves to determine y^ completely for n = r + 1, r + 2, . . . . For 
instance, taking n = Vj we derive from (3) 

2/r+i = P" + P% 

Again, taking n = r + 1, we obtain 

yr+2 = p** + 2p^q 

and so forth. Although, proceeding thus, step by step, we can find the 
required probability for any given n, this method becomes very labori- 
ous for large % and does not supply us with information as to the behavior 
of pn for large n. It is preferable, therefore, to apply known methods of 
solution to equation (3). First we can obtain a homogeneous equation 
by introducing Zn = I — yn instead of yn^ The resulting equation in 
2 :^ is 

(4) Zn-j-l qp^^n~r ~ 0 

and the corresponding initial conditions are: 

Zq := Zi == • • • == Zr-l =1; Zr = 1 — p- 

We could use the method of particular solutions as in the preceding 
problem, but it is more convenient to use the method of generating 
functions. The power series in ^ 

= Zo + Zi^ + Z2P + . . . 

is the so-called generating function of the sequence Zo, zi, Z 2 , . . . . 
If we succeed in finding its sum as a definite function of the development 
of this function into power series will have precisely Zn as the coefficient 
of To obtain let us multiply both members of the preceding 
series by the polynomial 


1 ~ I + S'pT’^h 
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The multiplication performed, we have 

(1 - S + = ^0 + (^1 - + ‘ • * + {Zr-l - + 

+ (Zr - + fe+1 - ;2r + + * • • . 

In the right-hand member the terms involving . . . have 

vanishing coefficients by virtue of equation (4) ; also Zk — Zk-^i = 0 for 
/c = 1, 2, 3, . . . r — 1, while 

2:0 = 1 and Zr — Zr-i = — p"* 

so that 

(1 - ^ + gp"r+')^(?) = 1 - p^r 
and 

1 ” f + gp’-r'"'’ 

The generating function (p{^) thus is a rational function and can be 
developed into a power series of f according to the known rules. The 
coefficient of gives the general expression for Zn^ Without any dif- 
ficulty, we find the following expression for Zn- 

(5) Zn = ^n,T P^^n—r,r 

where 

n 

^n.r = JC-D'CUCgr)' 

2 = 0 

and I3n-^r,r is obtained by substituting n — r instead of n. If n is not very 
large compared with r, formula (5) can be used to compute Zn and 

Vn ~ 1 Zw 

For instance, if n = 20, r = 5, and p = g = 3=^, we easily find 

_ W „ 10 , 

^20 -I + 0^2 64 647 

and hence 

;S2o = 0.75013 

correct to five decimals; t /20 = 0.24987 is the probability of a run of 5 
heads in 20 tossings of a coin. 

4. But if n is large in comparison with r, formula (5) would require 
so much labor that it is preferable to seek for an approximate expression 
for Zn which will be useful for large values of n. It often happens, and 
in many branches of mathematics, but especially so in the theory of 
probability, that exact solutions of problems in certain cases are not of 
any use. That raises the question of how to supplant them by con- 
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venient approximate formulas that readily yield the required numbers. 
Therefore, it is an important problem to find approximate formulas where 
exact ones cease to work. Owing to the general importance of approxi- 
mations, it will not be out of order to enter into a somewhat long and 
complicated investigation to obtain a workable approximate solution 
of our problem in the interesting case of a large n. 

Since <^(^) is a rational function, the natural way to get an appropriate 
expression of Zn would be to resolve (p{i) into simple fractions, correspond- 
ing to various roots of the denominator, and expand those fractions in 
power series of However, to attain definite conclusions following this 
method, we must first seek information concerning roots of the equation 

1 - I + = 0. 

5. Let 

m = I - 1 - 

where 

^ a = p’’(l -- p). 

When p varies from 0 to 1, the maximum of p^(l — p) is attained for 
p = — — and is r'^/{r + so that a g rV(^ + in all cases. 

T -p i 

To deal with the most interesting ease, we shall assume 


( 6 ) 

which involves 


p < 


r 

r + 1 


and we leave it to the reader to discover how the following discussion 

T 

should be modified if p ^ — r— 

When ^ starts to increase from 0, the function /(^ steadily increases 
and attains a positive maximum for f where 


(r + = 1 

after which /(^) decreases steadily to negative infinity. Hence, there 
are two positive roots of the equation /(^) = 0: fi, which is less than 

and another root greater than this number. This root is 1/p if 

condition (6) is fulfilled. 

The remaining roots are all imaginary if r is odd and there is one 
negative root among them if r is even. 

Now we shall prove that the absolute value of every imaginary or 
negative root is >llp. Let p be the absolute value of any such root. 
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We have first 

/(p) = p — 1 — < 0 

so that p belongs either to the interval (0, ^i) or to the interval {1/p, + 
and if we can show that p > fo then p can be only >l/p. If the root we 
consider is negative, p satisfies the equation 

F{p) = 1 + p “ = 0 

and since F{p) increases till a positive naaximum for p = is reached, and 
then decreases, the root of F{p) = 0 is necessarily >^o. If f = pe^^ is 
an imaginary root of /(J) = 0 we have, equating imaginary parts. 


(7) 


ap'^ 


sin (r + 1)^ 
sin B 


= 1 . 


But, whatever 6 may be 

Isin (r + 1)^1 


sin 6 


^ r + 1 


the equality sign being excluded if sin B ^ Hence, 

(r + 1 )q:p" > 1 


which implies p > fo. The statement is thus completely proved. 
6 . The equation 

^ - 1 - af+i = 0 
can be exhibited in the form 


I + = 1. 

Substituting ^ — pe"^ here, and again equating imaginary parts, we get 

ap'^^^ sin tB = sin 6 

and, combining this with (7), 

= sin (r + 1)^ ^ _ (sin rOy sin B 

^ sin rd ^ ^ [sin (r + 


^The extreme values of the ratio 
roots of the equation m sin d cos mB = 


sin mB 

— --(m mteger > 1) correspond to certain 
sm 0 

sin mB cos 0, but for every root of this equation 


sin mB 
sin B 


m 

Vl + - 1) sin® S 


^ m 


The equality sign is excluded if sin B differs from 0. 
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If the imagiaary part of ^ is positive, the argument 0 is contained 
between 0 and tt. In this case, it cannot be less than or greater 


than T 


r + 1 


For, if 0 < < 


At the same time 


r + 1 


sin rd sin (r + 1)^ 
rd (r + 1)0 


sin (r + 1)0 r + 1 


and hence 


sin (r + 1)0 r + 1 


sin rd sin 0 

sin (r + l)0j sin (r +1)0 (r + l)^+i^ 


which is impossible. That 0 cannot be greater than tt — 

simply, because in this case, sin (r + 1)0 and sin r0 would be of opposite 
signs and p would be negative. 


As j — T < 0 ^ TT 

r + 1 ““ 


r + 1' 


we have 


p sin 0 > p sin 


r + 1 


On the other hand, sin a: > 2x/t if 0 < x < t/2 and p > 1/p. Hence, 

2 

p sin 0 > 7 — r~rY"* 

(r + l)p 

Thus, imaginary parts of all complex roots have the same lower bound 

2 

(r + l)p 

of their absolute values. 

7. Denoting the roots of the equation /(^) = 0 by 

= 1,2, . . . r + 1) 


r-fJL 

= (i _ iVl 


we have 



Sec. 7] USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 83 


Hence, expanding each term into power series of ^ and collecting 
coefficients of we find 


r + l 








k=l 


(1 - r + 1 - rffc 


For every imaginary root, we have 


(1 - PMir 


(1 — P)^h(r + 1 — r^k) 


Kl - pf 


Since 


< p) 




< ^p; 


Ir + 1 - T^k\ 


< 


(r + l)p 
2r 


If T is oddj there are r — 1 imaginary roots and the part in the expression 
of Zn due to them in absolute value is less than 


(r + l)(r - 1) 


pn+2 ^ 


-P 


r{l — p) ^ ^ 1 — p^ 

The term corresponding to the root 1/p vanishes, so that finally 


= ^ ~ . 






-jp 


in4-2 


(1 — p)|i r + 1 — r^i ' “1 - p' 
where |0| < 1 and denotes the least positive root of the equation 

1 — f = 0. 

If r is even, there is one negative root. The part of z„ corresponding 
to this root is less than 

2pn+2 

(1 - p)r 

The whole contribution due to imaginary and negative roots is less than 


^n+2 ^ 


rtn+2 


r(l — py ^ 1 — p^ 
in absolute value. Thus, no matter whether r is odd or even, we have 


( 8 ) 


, = 1 ~ 

" (1 - P)b 


Ir" 

r + 1 - 




-1 < 0 < 1 . 


This is the required expression for 2 „, excellently adapted to the case of a 
.large value for n, since then the remainder term involving d is completely 
negligible in comparison with the first principal term. 
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The root can be found either by direct solution of the trinomial 
equation following Gauss’ method, or by application of Lagrange’s series. 
Applying Lagrange’s series, we have 

, , , , ^ar + 2)ar + 3) ^ • • {Ir + l) 

= 1 + a + ^ 

Z = 2 

los I. - . + + + 

Z=2 

both series being convergent if \a\ < r^/(r + 1)’*+^ and this condition is 
satisfied. 

8. Let us apply the approximate formula (8) to the case p = q — 
and r = 10. Using Lagrange’s series, we find that 

a = 1.0004909 

and 

Kfi 

Zn - 1.003947 • (1.0004909)-” + 

Hence, for n = 100, 1,000, 10,000, respectively, 

= 0.9559; 0.6146; 0.0074 

so that, for instance, the probabilities of a run of at least 10 heads in 
100, 1,000, or 10,000 throws of a coin are, respectively, 

0.0441; 0.3854; 0.9926. 

Thus, in 10,000 throws, it is quite likely that heads would turn up 10 or 
more times in succession. 

In general, for a given r and increasing n, the probability Pn tends to 1, 
so that in a very long series of trials, runs of any length are extremely 
likely to occur, a conclusion which at first sight seems paradoxical. 

9. In the preceding examples, an unknown probability was deter- 
mined by an ordinary equation in finite differences. Very often, how- 
ever, probability as a function of two or more independent variables is 
defined by a partial difference equation in two or more independent 
variables, together with a set of initial conditions suggested by the 
problem itself. A few examples will suffice to illustrate the use of 
partial equations in finite differences and to give an idea of the two 
principal methods for their solution; namely, Laplace’s method of 
generating functions, and the less well known, but elegant, method 
proposed by Lagrange. 

We start with an analytical solution of the problem which was dis-* 
cussed in detail in Chap. in. 
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Problem 3. Find the probability of exactly x successes in t inde- 
pendent trials with the constant probability p. 

Solution by Laplace’s Method. Let us denote the required proba- 
bility by To obtain x successes in t trials can be possible only in 
two mutually exclusive ways: (a) by obtaining x successes in ^ — 1 trials 
and a failure at the last trial; (6) by obtaining success at the last trial 
and X — 1 successes in the preceding ^ — 1 trials. The probability of 
case (a) is qyx,t-i and that of case (6) is The total probability 

yx,t satisfies the equation 

(9) yx,t = pyx-i,t~i + qyx,t-i 

for~ all positive x and t. This equation alone does not determine yx,t 
completely, but it does so in connection with certain initial conditions. 
These conditions are 

yx,o = 0 if X > 0, 

( 10 ) 

yo,t = if t ^ 0. 

The first set of equations is obvious; the second set is the expression 
of the fact that if there are no successes in t trials, the failures occur t 
times in succession, and the probability for that is qK 

Following Laplace, we introduce for a given t the generating function 
of yo,tj yi,t) y 2 ,t, . . . , that is, the power series 

oo 

<PtW = yo.t + 2 / 1 , + 2 / 2 . 4 ?® + • • • = 

a: = 0 

Taking t — 1 instead of t, separating the first term and multiplying by 
g, we have 

00 

Q'(Pi_i(?) = g2/o.4-i + 2 ^g 2 /*.t-i?’^; 

X=1 

and similarly 

eo 

= 2)P2/x-l,«-lS®. 

x — 1 

Adding and noting equation (9) we obtain 

(pf + Q)<Pt-i{^) = + qyo.t-^i ~ yo,t, 

but because of (10) 

gt/o.«-i “• yo.t = = 0 


and hence, 


< Pt {0 = (p? + 
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for every positive t. Taking i = 1, 2, 3, . . . and performing successive 
substitutions, we get 

+ Q')Vo(f) 

and it remains only to find 


<^o(f) = ^0,0 + ^1,0? + 2/2,0^^ 

But on account of (10), 2/a;.o = 0 for ir > 0, while ^o.o = 1. Thus, 

<^o(?) = 1 


and 


To find yx,t it remains to develop the right-hand member in a power series 
of ^ and to find the coefficient of The binomial theorem readily gives 


Vx.t 


m - 1 ) 


1 • 2 




10. Poisson’s Series of Trials. The analytical method thus enables 
us to find the same expression for probabilities in a Bernoullian series 
of trials as that obtained in Chap. Ill by elementary means. Considering 
how simple it is to arrive at this expression, it may appear that a new 
deduction of a known result is not a great gain. But one must bear in 
mind that a little modification of the problem may bring new difficulties 
which may be more easily overcome by the new method than by a general- 
ization of the old one. Poisson substituted for the Bernoullian series 
another series of independent trials with probability varying from 
trial to trial, so that in trials 1, 2, 3, 4, . . . the same event E has different 
probabilities pi, p 2 , Vz, p 4 , . . . and correspondingly, the opposite event 
has probabilities gi, g 2 , Qz, g 4 , . . • where gjfc = 1 — in general. Now, 
for the Poisson series, the same question may be asked: what is the 
probability yx,i of obtaining x successes in t trials? The solution of this 
generalized problem is easier and more elegant if we m^ke use of differ- 
ence equations. 

First, in the same manner as before, we can establish the equation in 
finite differences 

( 11 ) y^,t = 4 - qtyx,t-i. 

The corresponding set of initial conditions is 

Vx,a = 0 if a: > 0 

(12) = gig2 • • • qt if t > 0 

2 / 0.0 = 1 . 

Giving ^((1) the same meaning as above, we have 
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a: = l 

00 

Pt^<pt-iQ) = '^ptyx-i.t~i^=‘, 

whence 

{pt^ + qt)(pt-i{0 = (pt(0 + qiVoa-i “ yo,t; 
but because of (12) 

qtyo,t-~i — yQ,t == qiq 2 * ' ' Qt — qm * * * = 0 , 

and thus 

<pX^) ~ {Pi^ + qt)<Pi-l{^) 

whence again 

= (Pi^ + 3'i)(P2? + ^ 2 ) * * * (pt^ + qt)<poU)- 
However, by virtue of (12), <^o(?) = 1 so that finally 

(ptiO = (pif + qi)(p2^ + g2) • • * (pi^ + gc). 

To find the probability of x successes in t trials in Poisson's case, one 
needs only to develop the product 

(pi? + gi)(p2? + g2) * * * {pti + qt) 

according to ascending powers of ^ and to find the coeSicient of 

11. Solution by Lagrange's Method. We shall now apply to equa- 
tion (9) the ingenious method devised by Lagrange, with a slight modifica- 
tion intended to bring into full light the fundamental idea underlying this 
method. Equation (9) possesses particular solutions of the form 

a^l3^ 

if a and 13 are connected by the equation 

a/3 = p + ga. 

Solving this equation for (3, we find infinitely many particular solutions 
* a'^{q + pa~^y 

where a is absolutely arbitrary. Multiplying this expression by an 
arbitrary function (p{a) and integrating between arbitrary limits, we 
obtain other solutions of equation (9). Now the question arises of how 
to choose <p{a) and the path of integration to satisfy not only equation (9) 
but also initial conditions (10). We shall assume that ^(a) is a regular 
function of a complex variable a in a ring between two concentric circles, 
with their center at the origin, and that it can therefore be represented in 
this ring by Laurent's series 
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If c is a circle concentric with the regularity ring of (p(a) and situated 
inside it, the integral 

1 


yx,t = ^.J a"" ^(q + pa ^yip{a)da 

is perfectly determined and represents a solution of (9). To satisfy 
the initial conditions, we have first the set of equations 


JL 

27r^ 


^<p(cx)doi == 0 for a; = 1, 2, 3, . . . 


which show that all the coefficients Cn with negative subscripts vanish, 
and that (p{a) is regular about the origin. The second set of equations 
obtained by setting x == 0 


±.( 

27riJc 


(q + pa-^y^^^da = 
a 


for 


i = 0, 1, 2, . . . 


serves to determine <^(a). If e is a sufficiently small complex parameter, 
this set of equations is entirely equivalent to a single equation: 


1 I (p{oi)da 
2TriJc a — €(p + 


qa) 1 — eg 

Now the integrand within the circle c has a single pole ao determined by 
the equation 

aa = e{p -y qoco) 

and the corresponding residue is 

<pM 
1 — qe 

At the same time, this is the value of the left-hand member of the above 
equation, so that 

<p(ao) _ 1 


1 - ge 


q^ 


or 


(p(ao) = 1 

for all sufficiently small e or ao. That is, (p(a) = 1 and 


yx,t 




q-\- -] da 


is the required solution. It remains to find the residue of the integrand; 
that is, the coefficient of 1/a in the development of 
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in series of ascending powers of a. That can be easily done, using the 
binomial development, and we obtain 

yx,t = Cfp^q^'-^ 

as it should be. 

12. Problem 4. Two players, A and B, agree to play a series of 
games on the condition that A wins the series if he succeeds in winning a 
games before B wins 6 games. The probability of winning a single game 
is p for A and g = 1 — p for S, so that each game must be won by either 
A or B. What is the probability that A will win the series? 

Solution. This historically important problem was proposed as an 
exercise (Prob. 12, page 58) with a brief indication of its solution based 
on elementary principles. To solve it analytically, let us denote by 
yx,t the probability that A will win when x games remain for him to win, 
while his adversary B has t games left to win. Considering the result 
of the game immediately following, we distinguish two alternatives: 
{a) A wins the next game (probability p) and has to win x ^ 1 games 
before B wins t games (probability Q>) A loses the next game 

(probability q) and has to win x games before 5 can win t — 1 games 
(probability The probabilities of these two alternatives being 

pyx-i,t and qyx,t-^i their sum is the total probability yx,u Thus, yx,t 
satisfies the equation 

(13) yx,t = PVx-i^ + qyx,t-.v 

Now, yxfi = 0 for x > 0^ which means that A cannot win, B having 
won all his games. Also, yo,t = 1 for ^ > 0, which means that A surely 
wins when he has no more games to win. The initial conditions in our 
problem are, therefore, 

yx,o = 0 if X > 0; 

(14) 

yo,t = 1 if t > 0. 

The symbol yo,o has no meaning as a probability, and remains undefined. 
For the sake of simplicity we shall assume, however, that yo,o = 0. 
Application of Laplace’s Method. Again, let 

== yx,Q + + yx, 2 ^^ . 

be the generating function of the sequence yx,o’j yx,if 2 /^. 2 ? . . . cor- 
responding to an arbitrary a; > 0. We have 

00 

t = l 

oe 
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and 

00 

qivAi) + = vvx-ifi + 

i = l 

or, because of (13), 

“ pyx-^1,0 ” yz,o + <pxi^)- 

Now, for every x > 0 

yx,o ^ yx~i,o ~ 0 

in conformity with the first set of initial conditions, which allows us to 
present the preceding relation as follows: 

whence 

But 


<Po(^) = 2 / 0.0 + 2 / 0 . 1 $ + 2 / 0 . 2 $^ + . . . = ^ + + ^3 + . . . = 


and finally 

^pX 

(1 - |)(1 - g^y 

It remains to develop the right-hand member in a power series of $ and 
find the coefficient of As 


= ^ + e + e + 


and 


-1+# + 1.2 
we readily get, multiplying these series according to the ordinary rules. 


_i_ + 1) 




y 




x{x-{- 1) 


+ 


x{x + 1) 


{x + t-2) ^^_, 


1-2 ^ ' ' 1 • 2 •••(«- 1) 

which coincides with the elementary solution indicated on page 58. 

Application of Lagrange’s Method. Equation (13) has particular 
solutions of the form 


where 


a/3 = p/5 + yoc. 



Sec. 12] USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 91 

Hence, we can either express a by jS or by a. Leaving it to the reader 
to follow the second alternative, we shall express a as a function of jS 
and seek the required solution in the form 

^ fcX(l 

where (^(/3) is again supposed to be developable in Laurent^s series in a 
certain ring; c is a circle described about the origin and entirely within 
that ring. Setting x = 0, we must have 

= 1 for i = 1, 2, 3, . . . 

and this set of equations is satisfied if we take 

m = j-2 + ^3 + ■ ■ • = ^(/_ 1); 

Now we have 

___ C 

~ J,(l - - 1) 

and for ^ = 0 

_ P* r _ A 

2Ttj,(i - - 1) 

as it should be, because for |/5| > 1 the integrand can be developed into a 
power series of l/ft the term with 1//? being absent. Thus, the required 
solution is given by 

_ r 

y^'* ~ (1 - - 1) 

where c is a circle of radius >1 described about the origin. The final 
expression for yx,t is obtained as the coefficient of 1//3 in the development 
of 

pxpt-i 

(1 - - 1) . 

into power series of 1/^. We obtain the same expression as before. 

Problems for Solution 

1. Each of n urns contains a white and h black balls. One ball is transferred from 
the first urn into the second, another one from the second into the third, and so on. 
Finally, a ball is drawn from the nth urn. What is the probability that it is white, 
when it is known that the first ball transferred was white? 

d + b + 

Ct 'T' O G 0 
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2. Two urns contain, respectively, a white and h black, and h white and a black 
balls. A series of drawings is made, according to the following rules : 

a. Each time only one ball is drawn and immediately returned to the same urn it 
came from. 

h. If the ball drawn is white, the next drawing is made from the first urn. 

c. If it is black, the next drawing is made from the second urn. 

d. The first ball drawn comes from the first urn. 

What is the probability that the nth ball drawn will be white? 


Ans. = - + 

2i 



3. Find the probability of a run of 5 in a series of 15 trials with constant prob- 
ability p = K. 2/15 = 23.3-6 - 70.3-12 =: 0.0314184. 

4. How many throws of a coin suffice to give a probability of more than 0.999 for 


a run of at least 100 heads? Ans. 1.76 • lO^i throws suffice. 

5. What is the least number of trials assuring a probability of ^ for a run of at 

least 10 successes if p = g = K? Ans. 1,420. 

6. Seven urns contain black and white balls in the following proportions: 


Urns 

1 

2 

3 

4 

5 

6 

7 

White 

1 

2 

2 : 

3 

2 

3 

4 

Black 

2 

1 

2 

1 

5 

2 

5 


One ball is drawn from each urn. What is the probability that there will be among 
them exactly 3 white balls? Ans. Coefficient of in. 


(U + l)(l^ + i)(H -1- i)(ll + i)m + f)(ie + -Dm + f) 

or 

If = 0.28025. 


7. Two players, each possessing $2, agree to play a series of games. The prob- 
ability of winning a single game is for both, and the loser pays |1 to his adversary 
after each game. Find the probability for each one of them to be ruined at or before 
the nth game? 

Solution. Let ym be the probability that after playing 2m games, neither of the 
players is ruined. We have 

ym+l ~ y^y?n 

and hence 


Vm 


1 


2 ”‘ 


The probability for one of the players to be ruined at or before the nth game is - 


2^+1 


if n = 2m or n == 2m -f 1. 

8 . Solve the same problem if each player enters the game with $3. 

Ans. if n = 2m — 1 or n = 2m. 

9. Players Ai, A2, . . • A^+i play a series of games in the following order: first Ai 
plays with A 2; the loser is out and the winner plays with the following player, A3; the 
loser is out again and the next game is played with A 4, and sopn; the loser always being 
out and his place taken by the next following player. The probability of winning a 
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single game is for each player and the series is won by the player who succeeds in 
winning over all his adversaries in succession. What is the probability that the 
series will stop exactly at the a;th game? What is the probability that the series will 
stop before or at the a;th game? 

Solution. Let y-x be the probability that the series terminates exactly at the xth 
game. That means that the player who won the game entered at the {x — n -{■ l)st 
game and won successively the n following games. Now, there are n — 1 cases 
to be distinguished according as the player beaten at the {x — n + l)st game has 
already won 1, 2, 3, . . . n — 1 games. Let p* be the probability that the loser in the 
{x — n + l)st game previously has won k games. The probability of ending the 
series in this case is ^*72^. On the other hand, 


so that 

'Pk yx—k 
^ " 2* ‘ 

Hence, ioT x > n 

1 1 1 
Vx = + 4 ^®“^ 4 - ... -I- 

Initial conditions: 

yi = 2/2 = • • • = yn-i = 0; yn = 
The generating function of yx'. 


2/1 + 2 / 2 ^ + 2 / 3 |^ + • 


and the generating function of the probability that the series will end before or at the 
a:th game is 

4-i) 

2»-Hi - J)(^1 - i + fl) 

10 . Three players. A, B, C, play a series of games, each game being won by one of 
them. If the probabilities for Aj B, C to win a single game are p, q, r, find the prob- 
ability of A winning a games before B and C win h and c games, respectively. 

Solution. Let Ax,y,z denote the probability for A to win the series when he has 
still to win X games, while B and C have to win y and z games, respectively. First, 
we can establish the equation 

Ax,v,z — pAx—l.y,z "h Q.Ax,y—l,z “j” f'Ax,y,z—l- 

Next, Ao^y^z = 1 for positive y, 0 , and Ax,o,z ~ 0 for positive x, z; Ax,y,Q = 0 for posi- 
tive X, y. Besides, although this is only a formal simplification, we shall assume 
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Ax,<i,z = 0, Ax,y,<i = 0 when x ot y or z vanishes. For the generating function of 


4>zUy ’?) = 2 ) 


we find the equation 


whence 


The final answer is 


v) = 

<f>x(k, v) = 


y,z—0 

V 


I — — Tiq 


V) 

iv 




Aa,b,c = p^j 


1 + -(5 + ^) + -j_ ^)2 


a(a + l)(g +2) 
1-2-3 


(q + rP + 


the dash indicating that powers of q and r with the exponents ^ b and ^ c are omitted. 

Obviously, the same method can be extended to any number of players, and leads 
to a perfectly analogous expression of probability. 

11. An urn contains n balls altogether, and among them a white balls. In a series 
of drawings, each time one ball is drawn, whatever its color may be, it is replaced by 
a white ball. Find the probability 2/®.r that after r drawings there are x white balls 
in the urn. 

Solution. The required probability satisfies the equation 


n — X 1 X 

Vx.T+i — yx-i,r H — yz,f 

n n 

Besides, 

2/a.o == 1, 2/x,o =0 if X 7^ ay yz,r =0 if x < a. 

From the preceding equation, combined with the initial conditions, we find suc- 
cessively 


Pa+lfT — 

(n — a)in 


- ><<■-- -■> [(^ 2 )- _ ^ (jy 

and so on. 

r 

12. If, in the problem of runs, p is supposed to be > - — prove that the probabil- 


ity of a run of r in n trials is greater than 

I p -p^ , >•(? + pi)\ 

\r - (r + l)pi 2 /I — Pi 

r 

where pi < - — — is a root of the equation 
r -h 1 


pI(i - pi) = p'Ci- - p)- 
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13 . To find an asymptotic expression of probability for a run of r in n independent 


trials, if p ^ 


r -f 1 

tive roots of the equation 


the following proposition is of importance: Imaginary and iiegaN 


(1 0<s^ — - — 

n — I 

are, in absolute value, greater than the root R > 1 oi the equation 

2x 

(1 — s)R^ — R + s cos — = 0. 

n 

Prove the truth of this statement. 

14. Given s urns containing the same number n of black and white balls in known 
proportions, drawings are made in the following manner: first, a single ball is drawn 
out of every urn; second, the ball drawn from the first urn is placed into the second; 
that drawn from the second is placed in the third, and so on; finally, the ball drawn 
from the last urn is placed in the first, so that again every urn contains n balls. Sup- 
posing that this operation is repeated t times, find the probability of drawing a white 
ball from the a:th urn. 

Solution. Let yx,t be the required probability. First, it can be shown that it 
satisfies the equation 

Vz.t “ ( 1 -i 

\ n,} n 

The initial probabilities 2/1,0, 2/2,0, . • • 2 /s.o are known; and, moreover, the function 
yx,t must satisfy a boundary condition of the periodic type, 2 / 0 , t == 2/s,^ Hence, 
applying Lagrange’s method, the following solution is found 

0 ^ - 2 ) + . . . ] 

where 

fix) = 2/®,o when a: > 0 
and the definition is extended to a; ^ 0 by setting 

f(~x) = f(s - x). 

If, to begin with, all urns contain the same number of white and black balls, so that 
f(x) = const. = p, we shall have, no matter what t is, 
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CHAPTER VI 


BERNOULLI’S THEOREM 

!• This chapter will be devoted to one of the most important and 
beautiful theorems in the theory of probability, discovered by Jacob 
Bernoulli and published with a proof remarkably rigorous (save for some 
irrelevant limitations assumed in the proof) in his admirable posthumous 
book ^^Ars conjectandi^^ (1713). This book is the first attempt at scien- 
tific exposition of the theory of . probability as a separate branch of 
mathematical science. 

If, in n trials, an event E occurs m times, the number m is called the 
^Trequency^^ of jE? in n trials, and the ratio m/n receives the name of 
^^relative frequency.’^ Bernoulli’s theorem reveals an important proba- 
bility relation between the relative frequency of E and its probability p. 

Bernoulli’s Theorem. With the 'probability approaching 1 or certainty 
as near as we please, we may expect that the relative frequency of an event E 
in a series of independent trials with constant probability p will differ from 
that probability by less than any given number e > 0, provided the number 
of trials is taken sufficiently large. 

In other words, given two positive numbers e and rj, the probability 
P of the inequality 



will be greater than 1 — 77 if the number of trials is above a certain 
limit depending upon € and rj. 

Proof. Several proofs of this important theorem are known which 
are shorter and simpler but less natural than Bernoulli's original proof. 
It is his remarkable proof that we shall reproduce here in modernized 
form. 

a. Denoting by Tm, as usual, the probability of m successes in n trials, 
we shall show first that 

if 6 > o and k > 0. Since the ratio 

_ n — x p 
r* a; + 1 ^ 

96 
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decreases as x increases we have for & > a 

Th+i ^ Ta+i Th+1 ^ Th 

m ^rn ^r ^ • 

J- b J- a J- ct-f 1 J- a 

Changing 6, a, respectively, into 6 + 1, a + l;6 + 2, a + 2; 
a + fc, it follows from the last inequality that 


that is, 


Tb.+.k 

Ta-^k 


^ Tb+k~i ^ ^ Th+i ^ T\ 

^ m ^ ^ m m 


Tb-{.k 

Tb 


Ta+k 
Ta ' 


b + kj 


b. Integers X and fx being determined by the inequalities 
X— l<np^X, jLt-“l<np + n€^iLt 

the probabilities A and C of the inequalities 


0 < 


m 


p <e; 


m 


n n 

are represented, respectively, by the sums 


— p ^ € 


A ==■ T\ 7x+i + • * • + Tfi^x 
C =■ Ty, A" Tyj^i 4- • • • + 


the first of which contains jit — X = ^ terms. Combining terms of the 
second sum into groups of g terms (the last group may consist of less than 
g terms) and setting for brevity 

Al = Tfi + Tfi+l + • • • + Ty^g-l 

A 2 = Tfx+g + Ty+g+l + • * • + Ty+2g-^l 
As = Ty+2g + Tii+2g+l + * * * + 7V+3ff-l 


we shall have 


C==AiA“A2HhA3'-f' * ’ ’ 


and at the same time 


(2) 

The ratio 


A 1 . A 2 ^ 

A ^ Tx' Al ^ Tx' ■ ‘ ‘ 


Al =- T\^g+i -f- * « « -f- rx4.2g-~i 

A Tx + Tx+i 4~ ' • • + ITx+ff-i 


is less than the greatest of numbers 
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But by inequaiity (1) 

Tx ^ Tx+I 


> 


> 


^X-|-2ff — 1 


hence 


Similarly, 


A ^ Tx 


A2 . Tfi+g 

Xi ^TT' 


and again by inequality (1) 

Consequently 


Tfi+g ^ T\+g 

T, Tx' 


A2 ^ Tg, 


A. 3 ^^0 

A 2 TfiJ^g 


Tp,+2g ^ 

m J 

fi-^g i (i 


Az ^ ^ 
As Tx 


and inequalities (2) are established. 
c. For X ^ \ 

Tx+i 


< 1 . 


It suffices to show that 


Tx+i _ n — \ p ^ ^ 

Tx X+lg ^ 

As X ^ np 

n-\ p ^ npq . 

X + 1 g ~ npq + q 

rp 

which shows that -7^ <1. 

lx 

The inequality just established shows that in the following expression: 


It ^JjL.,I±=± 

Tx T/x-i T(i^2 


T; 


/i-a+l 


T,. 




Tfi^^i 


Tx+i 

Tx 


all the factors are < 1 . Consequently, if we retain a S g first factors 
only, replacing the others by 1, we get 


Moreover, 


Tx = T,^i n~-2 T,^ ‘ 


Tfi—i TfjL^2 


< 


^ Tp—g+l 
^ 77 

JL U—OC 


{ 


Sec. Ij 


BERNOULLPS THEOREM 


99 


whence the following important inequality results: 


(3) 


"La <: ( ^ M + q ; 

T\ \/x — (X + IqJ 




Here a is an arbitrary positive integer ^ g. 

Now, let e be an arbitrary positive number, 
for 


(4) 

we have both 


n ^ 


Qf(l + e) — q 
eip + e) 


Then we can show that 


(i) 


n -- fj> + a p ^ p 

fjL — a + 1 q ~p + € 


and 


(ii) a S g. 


Since g, np + ne, it suffices to show that (i) is satisfied for }i = np + ne. 
If /X = np + ?^€ inequality (i) is equivalent to 


nq — ne a ^ q 
np ne — a 1 “p + € 


or, after obvious simplifications, 

ne{p + e) ^ a(l + e) — q. 

But this inequality follows from (4). To establish (ii), since a and g 
are integers, it suffices to show^ that a < ^ + 1. But m ^ 

\ < np + 1 and consequently g + 1 > ne. Hence (ii) will be estab- 
lished if we can show that ne a which by virtue of (4) will be true if 

+ ^) - g ^ ^ 

P + € 

that is, if 

o:(l + e) — q ^ ap + ae 

or aq — q 0 which is obviously true, a being a positive integer. 

d. The auxiliary integer a is still at our disposal. Given an arbitrary 
positive number ?? < 1 we shall determine a as the least integer satisfying 
the inequality 



At the same time 
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and since log 


and 


(-0 


> 


} we shall have 


V + ^‘ 

«< logl 

€ TJ 




€{p + e) 77 ■ € 

^ 1 + , 1 

then by virtue of (i) and (3) 


Consequently, if 

( 5 ) ^ ^ 


log- + " 
77 € 


Tx 


< V, 


and by virtue of (2) 

Ai < Ar]j A2 < A177 < At]^, Az < A 2 r) < A 7 )^f 

whence 

-477 


( 6 ) 


C <C A77 “t" A 77 ^ “f" At}^ -j- 


1-77 


This inequality holds if n satisfies ( 5 ). No trace of the auxiliary 
integer a is left. 

e. Let us now consider the inequalities 


-€<--- p < 0 and - - - e 

n n 

and introduce their respective probabilities B and D. These inequalities 
are equivalent to 

^ . n — m . , n — m . 

n n 

It is apparent that we can interpret J 5 or D as probabilities that the num- 
ber of occurrences = n — moi the event F opposite to in n trials will 

712/ 7 Y)/ 

satisfy either the inequality — ?<€ or — — Since 

the right-hand side of ( 5 ) contains only given numbers e, 77 it is clear that 

( 7 ) D < ^ 

if ( 5 ) is satisfied. 

Now A + B — P is the probability of the inequality 
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and C + D = Q is the probability of the opposite inequality 


m 

r P! 


Hence P + Q = 1 . Moreover, by ( 6 ) and (7) 

Pv 


Q < 


Consequently, 


V 


or 


if only 


p+^>i 

1 — rj 
P > 1 — 7 ? 


. 1 +€, 1,1 

n ^ ^ log ~ 

° 7] e 


This completes the proof of Bernoulli's theorem. 

For example, if p = g = 3 ^-^ and € = 0 . 01 , rj = 0.001 we get from (5) 

n ^ 69,869 

which shows that in 69,869 trials or more there are at least 999 chances 
against 1 that the relative frequency will differ from by less than 3 d^oo* 
The number 69,869 found as a lower limit of the number of trials is 
much, too large. A much smaller number of trials would suffice to fulfill 
all the requirements. From a practical standpoint, it is important to 
find as low a limit as possible for the necessary number of trials (given e 
and 7 ]). With this problem we shall deal in the next chapter. 

2 . Bernoulli's theorem states that for arbitrarily given e and rj there 
exists a number no(e, rj) such that for any single value n > ?^o( 6 , 77 ) the 
probability of the inequality 


m 

p 

n 


< e 


will be greater than 1 — 77 . The question naturally arises, whether for 
given 6 and rj a number iV’(€, 77 ) depending upon € and 77 can be found such 
that the probability of simultaneous inequalities 


m 


- V 


< e 


for all n > N{€, 77 ) will still be greater than 1 — 77 . The following theo- 
rem due to Cantelli shows that this question can be answered positively. 

Cantelli’s Theorem. For given e < 1, 77 < 1 let N be an integer 
satisfying the inequality 

ff>§log|,+2 
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The probability that the relative frequencies of an event E will differ from 
p by less than e in the Nth and all the following trials is greater than 1 — 
Proof. We shall prove first that the probability Qn of the inequality 

m 

will always be less than According to results proved in the 

preceding section for any ?7 > 0 

Qn < rj 

if 

1 , 1 1 
n > — ^ log - + ~. 

This inequality, if we take yj == becomes 

1+6 1 1-+6, 

n > -j-n + - ^ log 2 

and in this form it is evident, since for e < 1 

1 - log 2 < 1 - 2 log 2 < 0. 

Hence, as stated, 

( 8 ) Qn < 

The event A, in which we are interested, consists in simultaneous 
fulfillment of all the inequalities 


m 
P\ 

/VI 


< 6 


for n = Nj N + W + 2, . . . . The opposite event B consists in 
the fulfillment of at least one of the inequalities 


m 

I p 

n 


where n can coincide either with N, or with AT + 1, or with A + 2, . . . * 
The probability of B, which we shall denote by R, certainly does not 
exceed the sum of the probabilities of all the inequalities 


m 

n 


V\ 


Tor n = A, A + ]\r + 2, . . . . 

Consequently, referring to (8), 


R 


< j 






n = N 
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To satisfy the inequality 


1 - 


< V 


it suffices to take 


Now 


2 2 , 2 , 1 
_log- + -log^-^.. 


p 1 _ ^ 72 ^ + 2. 


Consequently, if 


N ^ log 4- + 2 

we shall have R < v and at the same time the probability of A will be 
greater than 1 — -j?, which proves Cantelli’s theorem. 


Significance op Bernoulli's Theorem 

3. As was indicated in the Introduction, one of the most important 
problems in the theory of probability consists in the discovery of cases 
where the probability is very near to 0 or, on the contrary, very near to 1, 
because cases with very small or very “ great probability may have real 
practical interest. In Bernoulli's theorem we have a case of this kind; 
the theorem shows that with the probability approaching as near to 1 
or certainty as we please, we may expect that in a sufficiently long 
series of independent trials with constant probability, the relative fre- 
quency of an event will differ from that probability by less than any 
specified number, no matter how small. But it lies in the nature of the 
idea of mathematical probability, that when it is near 1, or, on the con- 
trary, very small, we may consider an event with such probability as 
practically certain in the first case, and almost impossible in the second. 
The reason is purely empirical. 

To illustrate what we mean, let us consider an indefinite series of 
independent trials, in which the probability of a certain event remains 
constantly equal to It can be shown that if the number of trials 
is, for instance, 40,000 or more, we may expect with a probability > 0.999 
that the relative frequency of the event will differ from by less than 
0.01. In other words, we are entitled to bet at least 999 against 1 that 
the actual number of occurrences will lie between the limits 0.49n and 
O.Sln if n ^ 40,000. If we could make a positive statement of this 
kind without any mention of probability, we should be offering an ideal 
scientific prediction. However, our knowledge in this case is incomplete 
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and all we are entitled to state is this: we are more sure to be right in 
predicting the above limits for the number of occurrences than in expect- 
ing to draw a white ball from an urn containing 999 white and only 1 
black ball 

In practical matters, where our actions almost never can be directed 
with perfect confidence, even incomplete knowledge may be taken as a 
sure guide. Whoever has tried to win on a single ticket out of 10,000 
knows from experience that it is virtually impossible. Now the convic- 
tion of impossibility would be still greater if one tried to win on a single 
ticket out of 1,000,000. 

In the light of such examples, we understand what value may be 
attached to statements derived from Bernoulli's theorem: Although the 
fact we expect is not bound to happen, the probability of its happening 
is so great that it may really be considered as certain. Once in a great 
while facts may happen contrary to our expectations, but such rare excep- 
tions cannot outweigh the advantages in everyday life of following the 
indications of Bernoulli's theorem. And herein lies its immense practical 
value and the justification of a science like the theory of probability. 

It should, however, be borne in mind that little, if any, value can be 
attached to practical applications of Bernoulli's theorem, unless the 
conditions presupposed in this theorem are at least approximately ful- 
filled: independence of trials and constant probability of an event for 
every trial. And in questions of application it is not easy to be sure 
whether one is entitled to make use of Bernoulli's theorem; consequently, 
it is too often used illegitimately. 

It is easy to understand how essential it is to discover propositions 
of the same character under more general conditions, paying especial 
attention to the possible dependence of trials. There have been valuable 
achievements in this direction. In the proper place, we shall discuss the 
more important generalizations of Bernoulli's theorem. 

4. When the probability of an event in a single experiment is known, 
Bernoulli's theorem may serve as a guide to indicate approximately how 
often this event can be expected to occur if the same experiments are 
repeated a considerable number of times under nearly the same condi- 
tions. When, on the contrary, the probability of an event is unknown 
and the number of experiments is very large, the relative frequency of 
that event may be taken as an approximate value of its probability. 
Bernoulli himself, in establishing his theorem, had in mind the approxi- 
mate evaluation of unknown probabilities from repeated experiments. 
That is evident from his explanations preceding the statement of the 
theorem itself and its proof. Inasmuch as these explanations are interest- 
ing in themselves, and present the original thoughts of the great discov- 
erer, we deem it advisable here to give a free translation from Bernoulli's 
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book. After calling attention to the fact that only in a few cases can 
probabilities be found a priori, Bernoulli proceeds as follows: 

So, for example, the number of cases for dice is known. Evidently there are 
as many cases for each die as there are faces, and all these cases have an equal 
chance to materialize. For, by virtue of the similitude of faces and the uniform 
distribution of weight in a die, there is no reason why one face should show up 
more readily than another, as there would be if the faces had a different shape 
or if one part of a die were made of heavier material than another. So one knows 
the number of cases when a white or a black ticket can be drawn from an urn, 
and besides, it is known that all these cases are equally possible, because the num- 
bers of tickets of both kinds are determined and known, and there is no apparent 
reason why one of these tickets could be drawn more readily than any other. 
But, I ask you, who among mortals will ever be able to define as so many cases, 
the number, e.g., of the diseases which invade innumerable parts of the human 
body at any age and can cause our death? And who can say how much more 
easily one disease than another — plague than dropsy, dropsy than fever — can 
kill a man, to enable us to make conjectures about the future state of life or 
death? Who, again, can register the innumerable cases of changes to which the 
air is subject daily, to derive therefrom conjectures as to what will be its state 
after a month or even after a year? Again, who has sufficient knowledge of the 
nature of the human mind or of the admirable structure of our body to be able, 
in games depending on acuteness of mind or agility of body, to enumerate cases 
in which one or another of the participants will win? Since such and similar 
things depend upon completely hidden causes, which, besides, by reason of the 
innumerable variety of combinations will forever escape our efforts to detect 
them, it would plainly be an insane attempt to get any knowledge in this fashion. 

However, there is another way to obtain what we want. And what is impossi- 
ble to get a priori, at least can be found a posteriori; that is, by registering the 
results of observations performed a great many times. Because it must be pre- 
sumed that something may occur or not occur, as many times as it had previously 
been observed to occur or not occur under similar conditions. For instance, if, 
in the past, 300 men of the same age and physical build as Titus is now, were 
investigated, and it were found that 200 of them had died within a decade, the 
others continuing to enjoy life past this term, one could pretty safely conclude 
that there are twice as many cases for Titus to pay his debt to nature within the 
next decade than to survive beyond this term. So it is, if somebody for many 
preceding years had observed the weather and noticed how many times it was 
fair or rainy; or if somebody attended games played by two persons a great many 
times and noticed how often one or the other won; by these very observations he 
would be able to discover the ratio of cases which in the future might favor the 
occurrence or failure of the same event under similar circumstances. 

And this empirical way of determining the number of cases by experiments is 
neither new nor unusual. For the author of the book ^^Ars cogitandi,” a man 
of great acumen and ingenuity, in Chap. 12 recommends a similar procedure, 
and everybody does the same in daily practice. Moreover, it cannot be con- 
cealed that for reasoning in this fashion about some event, it is not sufficient to 
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make a few experiments, but a great quantity of experiments is required; because 
even the most stupid ones by some natural instinct and without any previous 
instruction (which is rather remarkable) know that the more experiments are 
made, the less is the danger to miss the scope. 

Although this is naturally known to anyone, the proof based on scientific 
principles is by no means trivial, and it is our duty now to explain it. However, 
I would consider it a small achievement if I could only prove what everybody 
knows anyway. There remains something else to be considered, which perhaps 
nobody has even thought of. Namely, it remains to inquire, whether by thus 
augmenting the number of experiments the probability of getting a genuine ratio 
between numbers of cases, in which some event may occur or fail, also augments 
itself in such a manner as finally to surpass any given degree of certitude; or 
whether the problem, so to speak, has its own asymptote; that is, there exists a 
degree of certitude which never can be surpassed no matter how the observations 
are multiplied; for instance, that it never is possible to have a probability greater 
than >1, or % that the real ratio has been attained. To illustrate this by an 
example, suppose that, without your knowledge, 3,000 white stones and 2,000 
black stones are concealed in a certain urn, and you try to discover their numbers 
by drawing one stone after another (each time putting back the stone drawn 
before taking the next one, in order not to change the number of stones in the 
urn) and notice how often a white or a black stone appears. The question is, 
can you make so many drawings as to make it 10, or 100, or 1,000, etc., times 
more probable (that is, morally certain) that the ratio of frequencies of white and 
black stones will be 3 to 2, as is the case with the number of stones in the urn, 
than any other ratio different from that? If this were not true, I confess nothing 
would be left of our attempt to explore the number of cases by experiments. 
But if this can be attained and moral certitude can finally be acquired (how that 
can be done I shall show in the next chapter), we shall have cases enumerated a 
posteriori with almost the same confidence as if they were known a priori. And 
that, for practical purposes, where ‘‘morally certain” is taken for “absolutely 
certain” by Axiom 9, Chap. II, is abundantly sufficient to direct our conjectures 
in any contingent matter not less scientifically than in games of chance. 

For if instead of an urn we take the air or the human body, that contain in 
themselves sources of various changes or diseases as the urn contains stones, we 
shall be able in the same manner to determine by observations how much more 
likely one event is to happen than another in these subjects. 

To avoid misunderstanding, one must bear in mind that the ratio of cases 
which we want to determine by experiments should not be taken in the sense of a 
precise and indivisible ratio (for then just the contrary would happen, and the 
probability of attaining a true ratio would diminish with the increasing number of 
observations) but as an approximate one; that is, within two limits, wliich, 
however, can be taken as near as we wish to each other. For instance, if, in the 
case of the stones, we take pairs of ratios and 29^00 or and 

etc., it can be shown that it will be more probable than any degree of 
probability that the ratio found in experiments will fall within these limits than 
outside of them. Such, therefore, is the problem which we have decided to 
publish here, now that we have struggled with it for about twenty years. The 
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novelty of this problem as well as its great utility, combined with equal difficulty, 
may add to the weight and value of other parts of this doctrine. — ''Ars Conjec- 
tandi/' pars quarta, Cap. IV, pp. 224-227. 


Application to Games of Chance 

5. One of the cases in which the conditions for application of Ber- 
noulli’s theorem are fulfilled is that of games of chance. It is not out 
of place to discuss the question of the commercial values of games from 
the standpoint of Bernoulli’s theorem. '^Game of chance” is the term 
we apply to any enterprise which may give us profit or may cause us 
loss, depending on chance, the probabilities of gain or loss being known. 
The following considerations can be applied, therefore, to more serious 
questions and not only to games played for pastime or for the sake of 
gaining money, as in gambling. 

Suppose that, by the conditions of the game, a player can win a 
certain sum a of money, with the probability p; or can lose another 
sum 6 with the probability g = 1 — p. 

If this game can be repeated any number of times under the same 
conditions, the question arises as to the probability for a player to gain 
or lose a sum of money not below a given limit. Let us denote by n 
the total number of games, and by m the number of times the player 
wins. Considering a loss as a negative gain, his total gain will be 

K — ma — (n — m)b. 

It is convenient to introduce instead of m another number a defined by 


a — m — np 

and called discrepancy.” Expressed in terms of a the preceding expres- 
sion for the gain becomes 


The expression 


K = n(pa “ qb) + (a + b)a. 


E ^ pa — qb 


entering as the coefficient of n has, as we shall see,' an important bearing 
on the conclusion as to the commercial value of the game. It is called the 
^hnathematical expectation” of the player. Suppose at first that this 
expectation is positive. By Bernoulli’s theorem the probability for a 
discrepancy less than —m, e being an arbitrary positive number, is 
smaller than any given number, provided, of course, the number of games 
is sufficiently large. At the same time, with the probability approaching 
1 as near as we please, we may expect the discrepancy to be ^ — ne. 
However, if this is the case, the total gain will surpass the number 

n{E — e{a + b)] 
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which, for sufficiently large n, itself is greater than any specified positive 
number. It is supposed, of course, that e is small enough to make the 
difference 

E - e{a + b) 

positive. And that means that the player whose mathematical expecta- 
tion is positive may expect with a probability approaching certainty as 
near as we please to gain an arbitrarily large amount of money if nothing 
prevents him from playing a sufficient number of games. 

On the contrary, by a similar argument, we can see that in case of 
a negative mathematical expectation, the player has an arbitrarily small 
probability to escape a loss of an arbitrarily large amount of money, 
again under the condition that he plays a sufficiently large number of 
games. 

Finally, if the mathematical expectation is 0, it is impossible to make 
any definite statement concerning the gain or loss by the player, except 
that it is very unlikely that the amount of gain or loss will be considerable 
compared with the number of games. 

It follows from this discussion that the game is certainly favorable 
for the player if his mathematical expectation is positive, and unfavorable 
if it is negative. In case the mathematical expectation is 0, neither 
of the parties participating in the game has a decided advantage and then 
the game is called equitable. Usually, games serving as amusements are 
equitable. On the contrary, all of the games operated for commercial 
purposes by individuals or corporations are expressly made to be profita- 
ble for the administration; that is, the mathematical expectation of the 
administration of a game operated for lucrative purposes is positive at 
each single turn of the game and, correspondingly, the expectation of any 
gambler is negative. This confirms the common observation that those 
gamblers who extend their gambling over large numbers of games are 
almost inevitably ruined. At the same time, the theory agrees with 
the fact that great profits are derived by the administrations of gaming 
places. 

A good illustration is afforded by the French lottery mentioned on 
page 19, which, as is well known, was a very profitable enterprise operated 
by the French government. Now, if we consider the mathematical 
expectation of ticket holders in that lottery, we find that it was negative 
in all cases; namely, denoting by M the sum paid for tickets, we find the 
following expectations: 

On 1 ticket — 1)M == — JM, 

On 2 tickets (fff — 1)^ = 

On 3 tickets 


and so forth. 
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On the other hand, the expectation of the administration was always 
positive, and because of the great number of persons taking part in this 
lottery, the number of games played by the administration was enormous, 
and it was assured of a steady and considerable income. This was an 
enterprise avowedly operated for the purpose of gambling, but the same 
principles underlie the operations of institutions having great public 
value, such as insurance companies, which, to secure their income, always 
reserve certain advantages for themselves. 

Experimental Verification of Bernoulli’s Theorem 

6. Bernoulli’s theorem, like any other mathematical proposition, is 
a deduction from ideal premises. To what extent these premises may be 
considered as a good approximation to reality can be decided only by 
experiments. Several experiments established for the purpose of testing 
various theoretical statements derived from general propositions of the 
theory of probability, are reported by different authors. Here we shall 
discuss those purporting to test Bernoulli’s theorem. 

I. Buffon, the French naturalist of the eighteenth century, tossed a 
coin 4,040 times and obtained 2,048 heads and 1,992 tails. Assuming 
that his coin was ideal, we have a probability of for either heads or 
tails. Now, the relative frequencies obtained by his experiments are: 

Iflf = 0.507 for heads 
= 0.493 for tails 

and they differ very little from the corresponding probabilities, 0.500. 
In this case, the conclusions one might derive from Bernoulli’s theorem 
are verified in a very satisfactory manner. 

II. De Morgan, in his book ^^Budget of Paradoxes” (1872), reports 
the results of four similar experiments. In each of them a coin was 
tossed 2,048 times and the observed frequencies of heads were, respec- 
tively, 1,061, 1,048, 1,017, 1,039. The relative frequencies corresponding 
to these numbers are 

mi = 0.518; mi = 0.512; Wd = 0.497; Hff = 0.507. 

The agreement with the theory again is satisfactory. 

III. Charlier, in his book Grundziige der mathematischen Statistik,” 
reports the results of 10,000 drawings of one playing card out of a full 
deck. Each card drawn was returned to the deck before the next draw- 
ing. The actual result of these experiments was that black cards 
appeared 4,933 times, and consequently the frequency of red cards was 
5,067. The relative frequencies in this instance are: 

= 0.4933 for a black card 
vViAftr = 0.5067 for a red card 
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and they differ but slightly from the probability, 0.5000, that the card 
drawn will be black or white. The agreement between theory and experi- 
ment in this case, too, is satisfactory. 

IV. The author of this book made the following experiment with 
pla 3 dng cards: After excluding the 12 face cards from the pack, 4 cards 
were drawn at a time from the remaining 40, and the number of trials 
was carried to 7,000. The number of times in each thousand that the 
four cards belonged to different suits, was: 

I II III IV V VI VII 

113 113 103 105 105 118 108 

Altogether the frequency of such cases was 765 in 7,000 trials, whence 
we find for the relative frequency 

~ 0.1093 

while the probability for taking 4 cards belonging to different suits is 

im = 0.1094. 

V. In J. L. Coolidge’s ^^Introduction to Mathematical Probability, 
one finds a reference to an experiment made by Lieutenant R. S. Hoar, 
TJ.S.A., but the reported results are incomplete. The author of this book 
repeated the same experiment which consisted in 1,000 drawings of 5 cards 
at a time, from a full pack of 52 cards. The results were : 503 times the 
5 cards were each of different denominations; 436 times 2 were of the same 
denomination with 3 scattered; 45 times there were 2 pairs of 2 different 
denominations and 1 odd card; 14 times 3 were of the same denomination 
with 2 scattered; 2 times there were 2 of one denomination and 3 of 
another. The remaining possible combination, 4 cards of the same 
denomination with 1 odd, never appeared. The probabilities of these 
different cases are, respectively, 

iiil = 0.507; HK = 0.423; M = 0.048; 

= 0 . 021 ; = 0 . 001 ; = 0 . 000 . 

The corresponding theoretical frequencies are 507, 423, 48, 21, 1, 0, 
while the observed frequencies were 503, 436, 45, 14, 2, 0. The dis- 
crepancies are generally small and the greatest of them, 13, is still within 
reasonable limits. Deeper investigation shows that the probability that 
a discrepancy will not exceed 13 is about hence, the observed deviation 
of 13 units cannot be considered abnormal. 

VI. Bancroft H. Brown published, in the American Mathematical 
Monthly , page 351, the results of a series of 9,900 games of craps. 
This game is played with two dice, and the caster wins unconditionally 
if he produces 7 or 11 points, which are called ^'naturals” ; he loses the 
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game in case of 2, 3, or 12 points, called craps. But if he produces 
4, 5, 6, 8, 9, or 10 points, he does not win, but has the right to cast the 
dice an unlimited number of times until he throws the same number of 
points that he had before, or until he throws a 7. If he throws 7 before 
obtaining his point, he loses the game; otherwise he wins. 

It is a good exercise to find the probability of winning this game. 
It is 

m = 0.493 

that is, a little less than 3^^. Multiplying the number of games, in our 
case 9,900, by this probability, we find that the theoretical number of 
successes is 4,880 and of failures, 5,020. Now, according to Bancroft H. 
Brown, the actual numbers of successes and losses are, respectively, 
4,871 and 5,029. The discrepancy 

4871 - 4880 = -9 

is extremely small, even smaller than could reasonably be expected. 
The same article gives the number of times craps'^ were produced; 
namely, 2 appeared 259 times, 3 appeared 508 times, and 12 appeared 
293 times, making the total number of craps 1,060. The probability 
of obtaining craps is 

^ + A" + ■?V ~ i 

hence, the theoretical number of craps should be 1,100. The discrepancy, 
1060 ~ 1100 = —40, is more considerable this time but still lies within 
reasonable limits. 

VII. E. Czuber made a complete investigation of lotteries operated 
on the same plan as the French lottery, in Prague between 1754 and 1886, 
and in Briinn between 1771 and 1886. The number of drawings was 
2,854 in Prague and 2,703 in Briinn. The probability that in each draw- 
ing the sequence of numbers is either increasing or decreasing, is 

= 0.01667 

while the observed relative frequency of such cases was 
Prague: 0.01612; Briinn: 0.01739 
and in both places combined 

0.01674. 

The probabilities that among five numbers in each drawing there is 
none or only one of the numbers 1, 2, 3, ... 9, are, respectively. 


0.58298 and 0.34070. 
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The correspoadiag relative frequencies were 

Prague: 0.58655 and 0.32656 
Briinn: 0.57899 and 0.34591 

and in both places combined 

0.58183 and 0.33587, respectively. 

The probability of drawing a determined number is Now, according 
to Czuber, for the lottery in Prague the actual number of occurrences for 
single tickets varied from 138 (for No. 6) to 189 (for No. 83), so that for 
all tickets the discrepancy varied from —20 to 31. Besides, there were 
only 16 numbers with a discrepancy greater than 15 in absolute value. 
All these results stand in good accord with the theory. 

VIII. One of the most striking experimental tests of Bernoulli's 
theorem was made in connection with a problem considered for the first 
time by Buffon. A board is ruled with a series of equidistant parallel 
lines, and a very fine needle, which is shorter than the distance between 
lines, is thrown at random on the board. Denoting by I the length of 
the needle and by h the distance between lines, the probability that the 
needle will intersect one of the lines (the other possibility is that the 
needle will be completely contained within the strip between two lines) is 
found to be 

21 

The remarkable thing about this expression is that it contains the 
number tt = 3.14159 * • • expressing the ratio of the circumference of a 
circle to its diameter. In the appendix we shall indicate how this expres- 
sion can be obtained, because in this problem we deal with a different 
concept of probability. 

Suppose we throw the needle a great many times and count the 
number of times it cuts the lines. By Bernoulli's theorem we may expect 
that the relative frequency of intersections will not differ greatly from 
the theoretical probability, so that, equating them, we have the means of 
finding an approximate value of x. 

One series of experiments of this kind was performed by R. Wolf, 
astronomer in Zurich, between 1849 and 1853. In his experiments the 
width of the strips was 45 mm., and the length of the needle was 36 mm. 
Thus the theoretical probability of intersections is 


™ = 0.5093. 
45x 
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The needle was thrown 5,000 times and it cut the lines 2,532 times; 
whence, the relative frequency 

M-l = 0.5064. 

The agreement between the two numbers is very satisfactory. If, 
relying on Bernoulli's theorem, we set the approximate equation 

72 

^ = 0.5064, 

457r ' 

we should find the number 3.1596 for tt, which differs from the known 
value of TT by less than 0.02. 

In another experiment of the same kind reported by De Morgan in 
the aforementioned book, Ambrose Smith in 1855 made 3,204 trials with 
a needle the length of which was % of the distance between lines. There 
were 1,213 clear intersections, and 11 contacts on which it was difficult 
to decide. If on this ground, we should consider half of them as inter- 
sections, we should obtain about 1,218 intersections in 3,204 trials, which 
would give the number 3.155 for tt. If all of the contacts had been treated 
as intersections the result would have been 3.1412 — very close to the 
real value of tt. 

In an excellent book ^^Calcolo delle Probabilita,^^ vol. 1, page 183, 
1925, by G. Castelnuovo, reference is made to experiments performed by 
Professor Reina under whose direction a needle of 3 cm. in length was 
thrown 2,520 times, the distance between lines being 6 cm. Taking into 
account the thickness of the needle, the probability of intersection was 
found to be 0.345, while actual experiments gave the relative frequency 
of intersections as 0.341. 

Appendix 

Buffon’s Needle Problem. Let h be the width of the strip between 
two lines and I < h the length of the needle. The position of the needle 
can be determined by the distance x of its middle point from the nearest 
line and the acute angle <p formed by the needle and a perpendicular 
dropped from the middle point to the line. It is apparent that x may 
vary from 0 to h/2 and (p varies within the limits 0 and x/2. We cannot 
define in the usual way the probability of the needle cutting the line, for 
there are infinitely many cases with respect to the position of the needle. 
However, it is possible to treat this problem as the limiting case of 
another problem with a finite number of possible cases, where the usual 
definition of probability can be applied. 

Suppose that h/2 is divided into an arbitrary number m of equal 
parts d — h/2m and the right angle ir/2 into n equal parts co = 7r/2n. 
Suppose, further, that the distance x may have only the values 

0, 5, 25, . . . md 
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and the angle <p the values 


0, OJ, 26J, 


nco. 


This gives 


N = (m + l)(?z + 1) 


cases as to the position of the needle, and it is reasonable to assume that 
these cases are equally likely. To find the number of favorable cases, we 
notice that the needle cuts one of the lines if x and ip satisfy the inequality 

X < <p. 

The number of favorable cases therefore, is equal to the number of 
systems of integers i, j satisfying the inequality 


W) 


g cos jo> 


supposing that i may assume only the values 0, 1, 2, ... m and j only 
the values 0, 1, 2, . . . n. Because we suppose I < h the greatest 
value of i satisfying condition (A) is less than m and we can disregard 
the requirement that i should be Now for given j there are A; + 1 

values of i satisfying (A) if denotes the greatest integer which is less 
than 

^cosico. 

In other words, k is an integer determined by the conditions 

k < cos jo) ^ k + 1. 

Ao 

The number of possible values for i corresponding to a given j can 
therefore be represented thus 

rnj = ™ cos jo3 + 

where t?, may depend on j but for all y is ^0 and < 1. Taking the sum 
of all the m,* corresponding to y = 0, 1, 2, . . . n, we obtain the number 
of favorable cases 


M = ^(1 + cos 0 ) + cos 2a? + 


+ cos noo) + nQ 


where 0 again is a number satisfying the inequalities 
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But, as is well known, 

1 + cos 0) + cos 2a; + * * * + cos no; = ^ + — — ^ 

2 o • ^ 

2sm2 

or, because a; = ^ 

^ 2n 

1 + cos a; + COS 2a; + • * * + cos no; = ^ - cot 

therefore 

M = cot I + ^ + ne. 

Dividing this by iV = (m + 1)(^ + 1) and substituting for 5 and a; 
their expressions 

^ h TT 

2m ^ 2n 

we obtain the probability in the problem with a finite number of cases 

^ — JL ^ 4n J_ m 1 n0 

N 2h m \ 71 \ 2h m \ ti \ (?2.-f-l) (m -f- 1) 

The probability in Buffon^s problem will be obtained by making m 
and n increase indefinitely in the above expression. Now, since 



Thus we arrive at the expression of probability 


V = 


hir 


in Buffon^s needle problem. 
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Problems for Solution 

Another very simple proof of Bernoulli's theorem, due to Tshebysheif (1821- 
1894), is based upon the following considerations: 

1. Prove the following identities : 


n 

^ Pm(w — np) = 0, 
m = 0 


n 

V Tni(m - np)2 = npq, 
w = 0 


Indication of the Proof. Differentiate the identity 


n 

e-npu(^pQU ^ q)n — ^ 

w = 0 

twice with respect to u and set u — 0. 

2. If Q is the probability of the inequality |m — wp| ^ ne prove that 


ne^ 


Indication of the Proof. In the identity 

n 

^ Tmim —npY = npq 

m = 0 

drop all the terms in which \m — np\ <n€ and in the remaining terms replace 

(m — np)2 

by The resulting inequality 



Im— 


is equivalent to the statement. 

3. Prove that 

P > 1 — p 

if n > pq/7}€^. 

Indication of the Proof. P — 1 — Q, Q < pq/ne^ and pqfn^^ < t? if ti > pqlp^. 
The following two problems show how probability considerations can be used in 
proving purely analytical propositions. 

4. 8. Bernstein’s Proof of Weierstrass’ Theorem. The famous theorem due to Weier- 
strass states that for any continuous function /(a;) in a closed interval a S x there 
exists a polynomial P{x) such that 

\f{x) ^P{x)\<<r 

for a ^ a; ^6 where o* is an arbitrary positive number. By a proper linear trans- 
formation the interval (a, h) can be transformed into the interval (0, 1). According 
to S. Bernstein, the polynomial 


n 

p{x ) = 

OT = 0 
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for sufficiently large n satisfies the inequality 

m -P(a:)l <<r 

uniformly in the interval 0 ^ re ^ 1. 

Indication of the Proof. For re = 0 and a: = 1 we have /(O) = F(0) and 

/(I) = F(l). 

It suffices to prove the statement for 0 < a; < 1. Let rr be a constant probability in 
■n independent trials. We have 

n 

(a) f(x) —P(x) = C^x^iX — x)^~^ 

m = 0 

By the property of continuous functions, there is a number e corresponding to any 
positive number <r such that 

whenever 

\x' — x\ < e (0 ^ x\ X ^ 1). 

Also, there exists a number M such that |/(.t)| ^ ilf for 0 ^ a; ^ 1. From equation 
(a) we get 



\f{x) -P(x)\ ^ -P+2MR 


where P and R are, respectively, the probabilities of the inequalities 

^ e. 


m 

' < e and j 

m 

X 

X 

n 


n 


Now F < 1 and 

R < 7 ) 

if n > lf4e^r]. Take rj = cr/4M; then 

|/(a^) - F(a;)l < cr 


if 


n > 


M 


6. Show that 


J,— 

x'^0- — xY~'^dx 

m 

6 

n 

a;”^(l — xY'" 


> 1 - 


2(n + l)e^ 


m m 

provided 0 < m < n and 6>0, \~ € < I (Castelnuovo). 

n n 
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Indication of the Proof, By Prob. 6, Chap. IV, page 72, the ratio 


— xy-^dx 
- xy-^^dx 

represents the probability Q of at least m + 1 successes in a series of n + 1 inde- 
pendent trials with constant probability 


Set 

whence 

But 

Hence 


m 

p = - - e. 


m -h 1 — (n l)p H- (ti + 1)<7' 


n ^ m 

(T — rTT -h € > e. 


n{n -h 1) 




(n -f 1)0-2 -j- i)g2 


^ 35^(1 _ xy-^dx 


— xy~^dx 


and by a similar argument 

j ^m(i _ xy'^^dx 

m 


■Z+‘ 


J^V(1 - a:)"-»dx 
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CHAPTER VII 


APPROXIMATE EVALUATION OF PROBABILITIES IN 
BERNOULLIAN CASE 

1. In connection with Bernoulli’s theorem, the following important 
question arises: when the number of trials is large, how can one find, at 
least approximately, the probability of the inequality 


where e is a given number? Or, in a more general form: How can one 
find, approximately, the probability of the inequalities 

I ra SV 

where I and V are given integers, the number of trials n being large? 

The exact formula for this probability is 

& — V 

p = 

where Tgj as before, represents the probability of s successes in n trials. 
While this formula cannot be of any practical use when n and V — I 
are large numbers, yet it is precisely such cases that present the greatest 
theoretical and practical interest. Hence, the problem naturally arises 
of substituting for the exact expression of F an approximate formula 
which will be easy to use in practice and which, for large w, will give a 
sufl&ciently close approximation to P. De Moivre was the first suc- 
cessfully to attack this difficult problem. After him, in essentially the 
same way, but using more powerful analytical tools, Laplace succeeded 
in establishing a simple approximate formula which is given in all books 
on probability. 

When we use an approximate formula instead of an exact one, there 
is always this question to consider: How large is the committed error? 
If, as is usually done, this question is left unanswered, the derivation of 
Laplace’s formula becomes an easy matter. However, to estimate the 
error comparatively long and detailed investigation is required. Except 
for its length, this investigation is not very difficult. 

2. First we shall present the probability Ts in a convenient analytical 
form. The identity 

F{t) - (pt + qy = To + Tit + T^t^ + . . . + TnF 

119 
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after substituting t = becomes 

F (/•<’) = To + Tie^ + + • • • + T„e’‘^. 

Multipl 3 dng it by e”“«’ and integrating between —t and r, we get 

e-^F{e^)d<p = 2 tT, 


because for an integral exponent h 


Thus 



0 if 

2t if 


ifc = 0. 


T, 



F(fi^)e~^^d<p 


and this is the expression for Ts suitable for our purposes. To find the 
sum 

a 

P = 

S=il 


we observe first that 

1 — 




On the other hand, the complex number F{e^) can be presented in 
trigonometrical form, thns: 

whence 



or, because P is real. 



Finally, because R is an even function of <p and 0 is an odd one, we can 
extend the integration over the interval 0, tt on the condition that we 
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l + V \ 

—o—T] 


sin 



<p 

-dcp. 


It is convenient to introduce instead of I and I' two numbers f i and f 2 
defined by 

Z = np + i V = np - i + f 2 V^ 

where Bn = npq. Setting further 


0 = np<p + X, 

P can be presented as 

P Pi 

where Pi and P 2 are obtained by taking f = f 1 and f f 2 in the integral 


( 1 ) 


j . 1 f B 


27rJo 


sin -^(p 


3. Our next aim is to establish upper and lower limits for R. 
Evidently 


Now 
log p 


n / \n 

jg = (p 2 _j_ g 2 ^ 2pq COS = f 1 — 4tpq sin^ ~ 


1 — 4pg sin' 


in^ 1^ = -2pg sin^ | - sin^ | - 


g(4pg)5 sin® | - 


whence 


log p < — 2pg sin^ 


Since < 7r/2, we have 

and consequently 

or 

( 2 ) 




log , < 


2pg 


p < e 
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for all values of cp in the interval of integration. On the other hand, we 
have 

sin -K > I - To > 0 for < 24 


and 


48 


sm 2 ^ 


2 4 

which gives another upper hound for p: 

(3) p < 

The corresponding upper bounds for R will be 

(4) 

( 5 ) 


B <e 


To find a lower bound for R we shall assume ^ g 7r/2. We can 
present log p thus: 


log p = - |(4pg)2 sin^ ^ - sin^ || 


. 2^1 


- g(4p?)® sin® I - 


On the other hand, 


g(4pg)® sin® | + i(4pg)^ sin® | + . . . < - 


i(4pg)® sin® | ^ 


1 — 4pg sin^ | 


<^(4pg)^sin®| 


and 


so that 


(ly 


<P ^ 1 • A <P 

2>3 ®“ 2 


2 pg 


~ sin^ 


^(4pg)5 sin® | 


^ 2 pg 

> sin^ 


4 _i( 

2 / 6 ' 

~ |(4pg)® sin® ^ sin^ ||l — 32p2g® sin^ |} ^ ® 


and consequently 


log P > _ i(42,g)2 giQ4 I > 


P? 2 4 

2 V' 2 ~ “ 4 "^ 
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if ^ g Hence, 

( 6 ) 


7 -» V. —~Bn<p^—\pqBn(p^ 

K > e ^ ^ 


and this is valid for (p S 7r/2. 
4. Let r be defined by 


Assuming Bn ^ 25 from now on, we shall have, 

r" ^ I 

and a fortiori r <Tr (2. Let us suppose now that (p varies in the interval 
0 ^ ^ ^ r. By inequality (6) we shall have 

E - - l) > - > 


1 r> 4 


because e~^ — 1 > —x for :r > 0 and pq ^ 

On the other hand, using inequality (5), we find that 


R - 


}2i‘^ -if 


< ■ f'" < 


Since 


Bnr^ 3 

1^24 =ie8<i. 


6^ — 6 ^ ^ 4 - 

From the two inequalities just established it follows that 

Ir 


B - 




2 


(7) 

in the interval 

0 ^ <p ^ r. 

5, We turn now to the angle 0. Evidently 


■Bn<p^ 


^ , p sm<p 

0 = n arc tg — 7 == no) 

^ q + p COB (p 


where 


o) = arc tg 


p sm <p 
q + p cos <p 


By successive derivations with respect to (p we find 
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— — + pq cos (p , __ yg(p — q) sin (p 

d<p + 2pq cos <p + dcp^ (p^ + 2pq cos <p + q^)^ 

— = 'nrrr'n ~ + (1 - 2pq) COS <p - 2pq COS" <p 

^^3 U V q.) ^^2 2pq cos <p + q^y 

d^o3 _ ( _ \ sin — 1 + 4pg+20j:)V+8pg(l— 2pg) cos cos^ (p] 

d(p^ ^ {p^+2pq cos <p+q^y 

and for ^ = 0 


1 e 

0 

= Pi 

0 

II 

■ (S). ■ «<»’ - 

q)- 

Furthermore, one 

! easily verifies that in the interval 0 ^ 

VII 


d^o> 

dv? 

HA 

001 CO 

1 

3l(l - 4pg sin* |) 



d*oo 

d<p‘^ 

g 2pq\p - 5'|^1 - 4^2 sin* ip- 


Hence, applying Taylor’s formula 

and supposing 0 ^ ^ i 

r, we get for % 

(8) 


X = iSn(p 

- q)i(? + 


where 





(9) 

\M\ < -^Bnlp 

- ?l(i - V<P'^)~\ 


or 





(10) 


X 

= L^ 


where 





(11) 

\L\ < xV-B„1p 

- 2l(i - 



Using inequalities (9) and (11), we easily find 

(12) sin {^y/Wnq> - x) = sin {t^/Rn<p) — lBn{p — q)(f^ cos (fV^^?) + r 
where 


(13) |r| < - g|(l -- pqr^) V® + - qYil - pqT^)-^<p^j 

provided 0 ^ ^ ^ r. 

6. To find an appropriate expression of the integral J we split it into 
two integrals, Ji and J' 2 , taken respectively between limits 0, r and r, tt. 
We have 




Sec. 7] APPROXIMATE EVALUATION OP PROBABILITIES 


125 


because sin ^ Let ti = t then by inequality (4) 

Z TV Z 


2Bn , 

^ ^ ^ \I‘K 


ri'i 

But for positive x the following inequality holds: 


(14) 

consequently 


/; 


e-^^du ^ 
u ^ 2x‘^’ 


"" T>dT ^ e-i-®”’’* 

< “ 




ZBl 

Noting that i?(^) is a decreasing function of tp we have for t g ^ g ri 
R{p) g R{r) < fe-sVF”. 


Hence, 




and combining this inequality with the one previously established, we 
have finally 


(15) 


|J,l<01og| + :^)c-ev^. 


7. More elaborate considerations are necessary to separate the 
principal term and to estimate the error term in Ji. Making use of the 
inequality 


sin X x\ 


< 


6 sin X 


we can present Ji thus: 


Ji 


where 




sin (^\/Wn<p — x) 


d<p “(*" A 


|A|< 


-Jo' 


R<pd<p) 


487r sin 

z 


and, because R < in the interval Q < <p <r 
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Since ^ % we find by direct numerical calculation 

< 0.0205, 

327r sin ^ 

and so, iSnally, 

lAl < 0.0205^-1. 

8* Referring now to inequality (7), we can write 

2 sin ^ 2 + A, 

AttJo (p 2xJo <p 

where 

Combining this with the result of the preceding section, we can present 
Ji thus 

(16) ^1=1- ~ .Ad<p + As 

Akjo ‘P 

and 

lAsI < 0.06055-b 

9. To simplify the integral in the right member of (16), we substitute 
for sin (f-\/jB„^ — x) its expression (12). Taking into account inequal- 
ity (13), we get (17): 

2 rVi.„,.sin (rV^y - x)^ ^ 2 rV,.,.,.gin_(r_ ^^) ^ _ 

27rJo ip 2xJo ip 

~ Jo ® ^ B„(p)dtp -f As 

where 

\M <^B„\p - s|(l - -f 

But 



T 
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and so 


I^ 3 L< - g|(l - pqr^)-^ + ^(p - q^il - pqr^y^B-K 

Now pq S S Bn 25, consequently 

--4=5-i(l - pqr^)-* g < 0.0385. 

4V2ir 20V^VlV 

On the other hand, 

X - pr> * 1 - 1{(^)’ - (s-’)} - m + 1*!” - ’>’• 

and for positive x the maximum of 

is attained for x^ — ^J4 zj whence it follows that 


Q4t 


\v 


?1(1 - PSr2)-« 



< 0.051. 


Taking into account all this, we have 

lAsI < 0.09ip - q\B-^. 

10. As to integrals in the right-hand member of (17) we can write 

(18) ^ f ^ f -I- A 4 

^TTjo <P 2tJo (p 

(19) cos (sVK<{>)d<p = 

" (tVK<p)dv + As 

where 


and 


because 


!A4^ 


Trjr <p Sir 


iAsI < 


^ ^ r 

6x 'bU 


e'^Vdu 



7r\/S 


f. 


e~^%Hu 


< xe~^^ 
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for X > 1, as can easily be proved. Finally, taking into account (15), 
(16), (17), (18), (19). we get 


( 20 ) 

+ 

+ 


J 


Af" 

2irJo 




^ 0.065 + 0.09|p - q\ , 


cos {tVK<p)d>p 


since for Bn ^ 25 

^ log I -{- -I- A»! -f Aai < i. 

4 iog 2 -1- g 3^ + ^ 2 

It now remains to evaluate definite integrals in (20). We have 


( 21 ) 

(22) 


2t, 


^ 00 

Jo ^ 




^sin (rV^y) ^^ _ _2 


_ 2 f “ .si 

2xjo ® “ 


.sin 


BniV - $) 
Qtt 


^ 00 

cos (X\^nT)d<() = 


_£JZi 

Ott'v/S, 


J 

n^O 




oO -U^ 

e ^^2 cos 


Differentiating the well-known integral 


X' 


g-a»2 (JOS hxdx 


_ 1 /ir 

2 V o' 


- _A“ 

;e~^“ (a > 0) 


twice with respect to h, and after that substituting a = 3^, 6 = f, we 
find for (22) this expression: 


V - <1 


(1 - i-2)e-if^ 


W^Bn 

On the other hand, an integral of the type 


L{a) 




.Sin au 


u 


du 


can be reduced to a so-called “probability integral/^ In fact, the 
derivation with respect to a gives 


a2 


CO 1 

Zyji 

0 e 2 cos audu = ^ 

and since L(0) = 0, 

L(o) = ^V^fje-i-^du. 
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Consequently, integral (21) can be reduced to 

V^Jo 

Having found an approximate expression of the integral J after sub- 
stituting in it f 2 and f i for f and taking the difference of the results, we 
find the desired expression of P. 

11. The result of this long and detailed investigation can be sum- 
marized as follows: 

Theorem. Let m he the number of occurrences of an event in a series 
of n independent trials with the constant probability p. The probability P 
of the inequalities 

np + i + f ^ m S rip - ^ + 
where extreme members are integers, can be represented in the form 

(23) P = + (1 

The error term a> satisfies the inequality 

< 0.13 + 0.18IP - , 1 

npq 

provided npq ^ 25. 

By slightly increasing the limit of the error term, this theorem can 
be put into more convenient form. Let h and t 2 be two arbitrary real 
numbers and let P denote the probability of the inequalities 

np -f ti\/ npq S m ^ np + t^V npq. 

If the greatest integers contained in 


np + t 2 \^npq and nq — ti\/npq 
are respectively, .42 and Ai, the preceding inequalities are equivalent to 

n ^ Ai ^ m ^ A 2 . 

To apply the theorem, we set 

np — i = A2 — np + h'Vnpq “ h 

np + I + f ia / npq = n — — np + ti's/npq + 


^2 and 9i being, respectively, the fractional 
nq — h^/npq. Hence, 


^2 — ^2 + 


Ti — “ 


V npq 



parts of 7ip + h's/n^ and 
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Applying Taylor’s formula, it is easy to verify that 










g ™ p 


6 a/ 27rnpg 


[(I- 


ii! _li; 

2^- 2 _ (1 - ff)e 2 


a / 2Trnpq 
q - V 
Q\/27rnpq 

2j 


[( 


< 


0.061 

npq 

if 
■ 2 


1 - tl)e 

O.Q69|p — q\ 


< 


npq 


whence, finally, we can draw the following conclusion: For any two 
real numbers h, h, the probability of the inequalities 

ti-\/npq S — np ^ U's/npq 
can be expressed as follows: 

p . ' fV.-* + a - »0e-.-.- + (I - ^ 

A/27rJ^i A/27rnpg 

+ ^ - [(1 - - (1 - t\)e-W\ + 0. 

Q\/2irnpq 

where 62 and di are the respective fractional parts of 

np + t2\^npq and nq — ti^/npq 


and 


|0| < 0.20 + 0.25b - g| ^ 

npq 


provided npq ^ 25. 

In particular, if ^2 = = t, the probability of the inequality 


m — n: 


is expressed by 


p\ ^ t^/npq 


P = Ce-i-^du + 1 — + Q 
A/ 27r Jo A/27rnpg 


with the same upper limit for 0. Laplace, supposing that np + t\/npq 
is an integer in which case ^2 = 0 and is a fraction less than {npq)“^^, 
gives for P the approximate expression 


P = 


2 r 




\/2Trnpq 


without indicating the limit of* the error. Evidently Laplace^s formula 
coincides with the formula obtained here by a rigorous analysis, save for 
terms of the same order as the error term 0. 
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To find an approximate expression for the probability P of the 
inequality 


m i 

I P 


it suffices to take 


t — e 


Then 


4 


IL 


p = 


= C'%-Ku + 


?2 


_ , ^ du+ + Q 

V ^TT J 0 V ^irnp q 


and evidently P tends to 1 as n increases indefinitely. This is the second 
proof of Bernoulli’s theorem. 

Referring to the above expression for the probability of the inequalities 


ti\/ npq ^ m — np ^ t 2 \/npq 

and supposing that the number of trials n increases indefinitely while 
ti and t 2 remain fixed, we immediately perceive the truth of the following 
limit theorem: The prohability of the inequalities 


tends to the limit 


m — np 

t-i rs - 








as n tends to infinity. 

This limit theorem is a very particular case of an extremely general 
theorem which we shall consider in Chap. XIV. 

12. To form an idea of the accuracy to be expected by using the 
foregoing approximate formulas, it is worth while to take up a few 
numerical examples. Let n = 200, p = g = and 


95 i m ^ 106 . 


The exact expression of the probability that m will satisfy these ine- 
qualities is 

___ 200 ! fm 100 > 99 . 100 - 99 > 98 . 

1001100 ! L \101 101 * 102 101 • 102 • 103 

100 ’ 99 • 98 • 97 100 • 99 • 98 • 97 • 96 Y 

101 * 102 ^ 103 • 104 101 * 102 • 103 • 104 • 105 /„ ' 



132 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VII 


The number in the brackets is found to be 9.995776 and its logarithm to 
five decimals 

0.99982. 


The logarithm of the first factor, again to five decimals, is 

2.75088, 

whence 

log P = T.75070; P = 0.56325, 


and this value may be regarded as correct to five decimals. Let us see 
now what result is obtained by using approximate formulas. In our 
example 


t\/npq = = 5; 


t = -4= = 0.707107 

V2 


and 


2 n -- 

-4=^\ e Mm = 0.52050. 

V^Jo 


The additional term 


g-0.25 


0.04394 


and by Laplace’s formula 

P = 0.56444. 


This is greater than the true value of P by 0.00119. Now, the theoretical 
limit of the error is nearly 

= 0.004 

so that, actually, Laplace’s formula gives an even closer approximation 
than can be expected theoretically. 

When npq is large, the second term in Laplace’s formula ordinarily 
is omitted and the probability is computed by using a simpler expression: 


P = 


2 p 

V^Jo" 


^du. 


In our case this expression would give 


P = 0.52050 

instead of 0.56325 with the error about 0.043, which amounts to about 
8 per cent of the exact number. Such a comparatively large error is 
explained by the fact that in our example npq = 50 is not large enough. 
In practice, when npq attains a few hundreds, the simplified expression for 
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P can be used when an accuracy of about two or three decimals is con- 
sidered as satisfactory. In general, the larger t is, the better approxima- 
tion can be expected. 

For the second example, let us evaluate the probability that in 6,520 
trials the relative frequency of an event with the probability 'p — % 
will differ from that probability by less than e = To find j?, we 

have the equation 

t\/npq = en 

where 


n = 6520, 

which gives 


t 


and, correspondingly. 


P — if Q — iy € — -gV, 


130.4 

\/l564.8 


3.2965, 


2 n -- 

\ e 0.999021. 

V^jo 


Since m satisfies the inequalities 


3912 - 130.4 ^ m g 3912 -f 130.4 


the fractions 0i and are Bi — = 0.4 and the additional term is 


— 0.000009. 
*>/3129.67r 

Hence, the approximate value of P is 

P = 0.999030. 


To judge what is the error, we can apply Markoff's method of con- 
tinued fractions to find the limits between which P lies. These limits are 

0.999028 and 0.999044. 

The result obtained by using an approximate formula is unusually good, 
which can be explained by the fact that in our example t is a rather large 
number. Even the simplified formula gives 0.999021, very near the 
true value. 

Finally, let us apply our formulas to the solution of the inverse 
problem: How large should the number of trials be to secure a probability 
larger than a given fraction for the inequality 
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Let us take, for example, p = }ij € = 0.01 and the lower limit of proba- 
bility 0.999. To find n approximately, we first determine t by the 
equation 


which gives 


2 n 


^du == 0.999, 


t = 3.291. 


Hence, 

^ = ^^^^(3.291)2 == 24,066, approximately. 


We cannot be sure that this limit is precise, since an approximate formula 
was used. But it can serve as an indication that for n exceeding this 
limit by a comparatively small amount, the probability in question will 
be >0.999. For instance, let us take n = 24,300. The limits for m 
being 

8,100 - 243 ^ m ^ 8,100 + 243, 
we find t from the equation 


and correspondingly 


t = 


= 3.3068 

\PQ 


2 r 

V^Jo 


t _h! 

e ^du 


0.999057. 


The additional term in Laplace^s formula being 0.000023, we find 
P > 0.99908 - 0.00006 > 0.999. 

Thus, 24,300 trials surely satisfy all the requirements. 


Problems for Solution 

1. Find approximately the probability that the number of successes will be con- 
tained between 2,910 and 3,090 in 9,000 independent trials with constant probability 

Ans, 0.9570 with an error in absolute value <10"'^ [using (23)]. 

2. In Buff on’s experiment a coin was tossed 4,040 times, with the result that heads 

turned up 2,048 times. What would be the probability of having more than 2,050 
or less than 1,990 heads? JLna. 0.337, 

3. R. Wolf threw a pair of dice 100,000 times and noted that 83,533 times the 
numbers of points on the two dice were different. What is the probability of having 
such an event occur not less than 83,533 and not more than 83,133 times? Does the 
result suggest a doubt that for each die the probability of any number of points was J^? 
Am. This probability is approximately 0.0898 and on account of its smallness some 
doubt may exist. 
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4 . If the probability of an event E is }4t what number of trials guarantees a 
probability of more than 0.999 that the difference between the relative frequency of 
E and will be in absolute value less than 0.01? Ans. 27,500. 

6. If a man plays 10,000 equitable games, staking $1 in each game, what is the 
probability that the increase or decrease in his fortune will not exceed $20 or $50? 

(a) 0.166; (5) 0.390. 

6. If a man plays 100,000 games of craps and stakes 50 cents in each game, what 

is the probability that he will lose less than $300? Ans, About Koo* 

7 . Following the method developed in this chapter, prove the following formula 
for the probability of exactly m successes in n independent trials with constant 
probability p: 


^Tvnpq 


1 + 


(g - V){F - 30 ' 
Oa / npq 


+ A 


where t is determined by the equation 


m — np t's/npq 

and 

, , 0.15 4- 0.25b - q\ , ,,/— ■ 

(npqp 

provided npq ^ 25. 

8 . Developments of this chapter can be greatly simplified H p ^ q = (sym- 
metrical case). In this case one can prove the following statement: The probability 
of the inequalities 


n 1 , 





can be expressed as follows: 


— r 






12-v^ 27rn 


where [A] < 1/2^® for w > 16. 

9 . In case of “rare” events, the probability p may be so small that even for a 
large number of trials the quantity X — np may be small; for example, 10 or less. 
In cases of this kind, approximation formulas of the type of Laplace^s cannot be used 
with confidence. To meet such cases, Poisson proposed approximate formulas of a 
different character. Let Pm represent the probability that in n trials an event with 
the probability p will occur not more than m times. Show that 




+ 




1 •2-3 


-{- A = Qm Hh A 


where 


and 


[a! < (e^ -- l)Qm if 
IA| < (e^ - 1)(1 - Qn.) if Qm<^ 


1 

X4--7 
4 n 


2{n — X) 
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Indication of the Proof. We have 
1 




Now, since g = 1 — 


X nX2 

^ + + ' • ■ + 
X 




1 • 2 • 3 • • • m 




n 


(i ‘Yi "'i (i 


and 


- V ifc-oX ~ - V 


< e 


^ I. 

2 ~izi (x+|)g 
yfc«0 ^ ^2(n — X) 


Consequently 


But 




2n. 


whence 

Fflt <C Qn 

On the other hand, 


= 6""^ 


X X2 X’” 

1 - 2 ‘ 1-¥*3 • • • w 


1 = 


= 2 


n(re - 1) • • • (n - M + 1) „ 

^ gn UpM = 


/* = Wl-j-l 


== r 




X^ 


ju;s= 7»+ 1 


1-2-3 


whence 

and 


1 - P^ < e^(l - Qm) 


Pn. > e^Qrr. + 1-6^. 

The final statenaent follows immediately from both inequalities obtained for Pm 
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10 . With the usual notation, show that 


where 


e-^^Q 

ml 


mX — m(m — l) 

Q ^ ^ n 2n2 2n 




(n — m)X3 


+ ; 


7^3 \] 

-m))y 


3(n X)3 2n(n 

Indication of the Proof. Referring to Chap. I, page 23, we have 


< 0 < 1 . 


m!\ n/ \ 2nJ 

T.>4,-X-'L-^rv. 

m\\ n) \ n/ 


But 


whence 


0-r 


< e 


^ (n. — w)X2 

^ n 2n^ , 


1 - ■ 


2n 


< e 


m(m— 1) 
2n , 


Xm 


mX {n — m)X^ 


Tm < — • e ^ 

ml 


2n 


On the other hand, 


-(n.-»t) I ^m^X (n — m)X^ (n — m)X^ 

> e 2(7i-X)2 3(71~X)3 


/ \m — l / \w — 1 m(m— -1) 

(l--) 2 =(l+— 2 >, 2(X=^. 

\ n/ \ n — mj 


Hence 




mX (n--m)X^ — (n — m)X^ 

> € n 2n2 2n Z{n-~X)^ 2n(n — m). 


and a fortiori 


A 


xy-»^^ _ V 


— "x — (n — m)X^ __ m(w-— 1) 

> e n 2n2 2n 


(?i — m)X3 

3(?i - X)® 


2n{n — m) 


If X and m are both small in comparison to n the above-introduced factor Q will be 
near L Under such circumstances we may be entitled to use an approximate formula 
due to Poisson 
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The preceding elementary analysis gives means to estimate the error incurred by using 
this formula. 

11. Apply the preceding considerations to the ease n == 1,000, p — Hoo» ^ 
and m = 10. Ans, 0.1256 < Tio < 0.1258. Poisson's formula gives 0.1251 — a 
very good approximation. AJo, 0.5807 < Pio < 0.5863. Taking Pio =* 0.583, the 
error in absolute value will be less than 3.3 ' 10“*^. By a more elaborate method it is 
found Pio = 0.5830. 
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CHAPTER VIII 


FURTHER CONSIDERATIONS ON GAMES OF CHANCE 

!• When a person undertakes to play a very large number of games 
under theoretically identical conditions^ the inference to be drawn from 
Bernoulli's theorem is that that person will almost certainly be ruined 
if the mathematical expectation of his gain in a single game is negative. 
In case of a positive expectation, on the other hand, he is very likely to 
win as large a sum as he likes in a sufficiently long series of games. 
Finally, in an equitable game when the mathematical expectation of a 
gain is zero, the only inference to be drawn from Bernoulli^s theorem is 
that his gain or loss will likely be small in comparison with the number of 
games played. 

These conclusions are appropriate however, only if it is possible to 
continue the series of games indefinitely, wdth an agreement to postpone 
the final settling of accounts until the end of the series. But if the 
settlement, as in ordinary gambling, is made at the end of each game, 
it may happen that even playing a profitable game one will lose all his 
money and will have to discontinue playing long before the number of 
games becomes large enough to enable him to realize the advantages 
which continuation of the games would bring to him. 

A whole series of new problems arises in this connection, known as 
problems on the duration of play or ruin of gamblers. Since the science 
of probability had its humble origin in computing chances of players in 
different games, the important question of the ruin of gamblers was 
discussed at a very early stage in the historical development of the 
theory of probability. The simplest problem of this kind was solved by 
Huygens, who in this field had such great successors as de Moivre, 
Lagrange, and Laplace. 

2. It is natural to attack the problem first in its simplest aspect, and 
then to proceed to more involved and difficult questions. 

Problem 1. Two players A and B play a series of games, the proba- 
bility of winning a single game being p for A and q for B, and each game 
ends with a loss for one of them. If the loser after each game gives his 
adversary an amount representing a unit of money and the fortunes of 
A and B are measured by the whole numbers a and 6, what is the proba- 
bility that A (or J?) will be ruined if no limit is set for the number of 
games? 


139 
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Solution. It is necessary first to show how we can attach a definite 
numerical value to the probability of the ruin of A if no limit is set for 
the number of games. As in many similar cases (see, for instance, Prob. 
15, page 41) we start by supposing that a limit is set. Let n be this 
limit. There is only a finite number of mutually exclusive ways in which 
A can be ruined in n games; either he can be ruined just after the first 
game, or just after the second, and so on. Denoting by pi, p 2 , - - . Pn 
the probabilities for A to be ruined just after the first, second, . . . nth 
game, the probability of his ruin before or at the nth game is 

Pi + P2 + • • • + Pn. 

Now, this sum being a probability, must remain <1 whatever n is. 
On the other hand, each term of this sum is ^0 for the same reason. 
Both remarks combined, show that the series 

Pi + P2 + P3 + • • ‘ 

is convergent. We take its sum as the probability for A to be ruined 
when nothing limits the number of games played. So it is clear that 
this probability, although unknown, possesses a perfectly determined 
numerical value. Let us denote by the probability for A to be ruined 
when his fortune is x. The probability we seek is pa- Obviously, 

(1) 2/0 = 1, 

for A is certainly ruined if he has no money left. Similarly 

( 2 ) Va+b = 0 

because if the fortune of A is a + &, it means that B has no money where- 
with to play, and certainly the ruin of A is then impossible. Further, 
considering the result of the game immediately following the situation 
in which the fortune of A amounted to x it is possible to establish an 
equation in finite differences which satisfies. For, if A wins this game 
(the probability of which case is p), his fortune becomes x + 1 and the 
probability of being ruined later is Pa:+i. By the theorem of compound 
probability, the probability of this case is pVx^i^ But if A loses (the 
probability of which is g), his fortune becomes x — I and the probability 
that the one possessing this fortune will be ruined is The proba- 

bility of this case is qpx-i- Now, applying the theorem of total proba- 
bility, we arrive at the equation 

(3) yoi py^j,x + qVx-i^ 

This equation has a particular solution of the form a® where a is a 
root of the equation 


a == pa^ + q. 
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li p 7 ^ q there are two roots 



and, correspondingly, there are two distinct particular solutions of 
equation (3): 

1 and (l) . 

Obviously, 




is also a solution of (3) for arbitrary C and D, 
C and D so as to satisfy conditions (1) and (2), 
equations 

C + D = 1 

^a+hQ _[_ qa+bj) = 


whence 


Now, we can dispose of 
To this end we have the 


„ ^ 


a+b 


rjO'+b 




D = - 






and 


qa-hb/px pd+bqx 

px(^qa+b _ pa+6^ 


It remains to take cc = a to obtain the required probability 

^ g°(g^ - p'’) ^ g»(p& - q>>) 


that the player A possessing the fortune a will be ruined. Similarly, 
the probability of the ruin of B is 


It turns out that 


Zb 


pa-\-h q^'^^ 


Va Zb — 1, 


SO that the probability that the series of games will continue indefinitely 
without A OT B being ruined, is 0. The probability 0 does not show the 
impossibility of an eternal game, because this number was obtained, 
not by direct enumeration of cases, but by passage to the limit. Theo- 
retically, an eternal game is not excluded. Actually, of course, this 
possibility can be disregarded. 
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If p = g = 1-'^^ SO that each single game is equitable, the preceding 
solution must be modified. In this case, the above quadratic equation 
in a has two coincident roots = 1,. and we have only one particular 
solution of (3), = 1. But another particular solution in this case is 

so that we can assume 

Vx C + Dx 

and determine C and D from the equations 

C= 1; C + D(a + b) 0 , 

Thus, we find that 


and for x = a 


Similarly, giving Zb the same meaning as above, 

a 

Zb == — rT* 
a + b 

If, therefore, each single game is equitable, the probabilities of ruin are 
inversely proportional to the fortunes of the players. The practical 
conclusion to be derived from this theoretical result is sheer common 
sense: It is unwise to play indefinitely with an adversary whose fortune 
is very large without submitting oneself to the great risk of losing all 
one's money in the course of the games, even if each single game is 
equitable. Gamblers who gamble at an even game with any willing 
individual are in the same condition as if they were gambling with an 
infinitely rich adversary. Their ruin in the long run is practically 
certain. 

If single games of the series are not equitable, that is, p 9 ^ q the 
conclusion may be different. Supposing p > q, we have a case when 
the expectation of A is positive; in each single game, A has an advantage 
over his adversary. The above expression for ya may be written in the 
form 



and, because g/p < 1, it is easy to see that ya remains always less than 


CL A’ b 


Va 


d b 
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and converges to this number when 6 becomes infinite. Thus, playing a 
series of advantageous games even against an infinitely rich adversary, 
the probability of escaping ruin is 



If a is large enough, this can be made as near 1 as we please, so that a 
player with a large fortune has good reason to believe that in the course 
of the games he will never be ruined, but that actually he is very likely 
to win a large sum of money. 

This conclusion again is confirmed by experience. Big gambling 
institutions, like the Casino at Monte Carlo, always reserve certain 
advantages to themselves, and, although they are willing to play with 
practically everybody (as if they played against an infinitely rich adver- 
sary) the chance of their being ruined is slight because of thejarge 
capital in their possession. 

3. In the problem solved above the stakes of both players were 
supposed to be equal, and we took them as units to measure the fortunes 
of both players. Next it wmiild be interesting to investigate the case in 
which the stakes of A and B are unequal. An exact solution of this 
modified problem, since it depends on a difference equation of higher 
^ order, would be too complicated to be of practical use. It is therefore 
extremely interesting that, following an ingenious method developed by 
A. A. Markoff, one can establish simple inequalities for the required 
probabilities which give a good approximation if the fortunes of the 
players are large in comparison with their stakes. 

Problem 2. If the conditions presupposed in Prob. 1 are modified, 
in that the stakes of A and B measured in a convenient unit are a and 
and their respective fortunes are a and 6, find the probabilities for A or 
B to be ruined in the sense that at a certain stage the capital of A will 
become less than a or that of B less than 

Solution. Let yx be the probability for A to be forced out of the 
game by the lack of sufficient money to set a full stake a when his 
fortune amounts to x and consequently that of his adversary is a + 6 — x. 
In the same way as before, w^e find that yx is a solution of the equation 
• in 'finite difference's: 

(4) yx = pyx-i-^ + 

To determine yx completely, in addition to (4), we have two sets of 
supplementary conditions : 

(5) ya = 2/1 == • • • = ya^i = 1 

( 0 ) ~ 1 ’ * * — 1 ) 0 . 
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Equation (5) expresses the fact that if the fortune of A becomes less 
than his stake, it is certain that A must quit. On the contrary, equation 

(6) indicates the impossibility for A to be ruined if the other player B 
does not have enough money to continue gaming. Equation (4) is an 
ordinary equation in finite differences of the order a + P. It has par- 
ticular solutions of the form where ^ is a root of the equation 

(7) - d^ + q = 0. 

The left-hand member for ^ = 0 is positive and with increasing 6 de- 
creases and attains a minimum when 


and then steadily increases and assumes positive values for large 6. 
This minimum must be negative or zero because ^ = 1 is a root of (7). 
Now, if it is negative, there are two positive roots of (7). One of them 
is ^ = 1 and another > or < 1 according as 


or else 


p < 


a 

a + 


or 


p > 


a 

T+J 


p^ — qa < 0 or >0. 


That is, the positive root of (7) different from 1 is > 1 when single games 
are favorable to B and < 1 if they are favorable to ^4. In case of equita- 
ble games, both positive roots coincide and ^ == 1 is a double root of (7). 
All the other roots of (7) are negative or imaginary. 

The regular way to solve the problem would be to write down the 
general solution of (4) involving a + ^ arbitrary constants to be deter- 
mined by conditions (5) and (6). As this method would lead to a com- 
plicated expression for y^, we shall refrain from seeking the exact solution 
of our problem, and instead, following A. A. Markoff^s ingenious remark, 
we shall establish simple lower and upper limits for which are close 
enough if the fortunes of the players are large in comparison with their 
stakes. 

Lemma. 1/ is a solution of equation (4) and none of the numbers 


yo, 2/1, .. . 2/«~i 

Va+h, pa+h—lj • . • ya+b—ff+l 

is negative, then y^ ^ 0/or a; = 0, 1, 2, ... a + 6. 

Proof. Let uf^ (& = 0, 1, 2, ... a — 1) represent the probability 
that the player A whose actual fortune is x (and that of his adversary 
a + b — x) will be forced to quit when his fortune becomes exactly = ft. 
Evidently is a solution of equation (4) satisfying the conditions 
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wf = 0 for a: = 0, 1, ... A: - 1, A; + 1, ... a - 1; a 4- 6, 

a + b — 1, . . . a + b — p + 1; wf’ = 1. 

Similarly, if = 0, 1, 2, . . . j(3 — 1) represents the probability that 
the player B will be forced to quit when the fortune of A beconaes exactly 
= a + 6 — Z, will be a solution of (4) satisfying the conditions 

2 ,a) = 0 for a; = 0, 1, 2, ... a — 1; a 4- Z>, ... a 4- 5 — Z + 1, 
a + b- l-1, ...a + b- ^ + 1; = 1- 

Thus we get a + /3 particular solutions of (4), and it is almost evident 
that these solutions are independent. Moreover, since they represent 
probabilities, S 0, n® ^ 0 for a: = 0, 1, 2, ... a + b. Now, any 
solution ^ 3 , of (4) with given values of 


2/0, yi, ■ ■ . 2/<.-i 

2/oz-|-6j y a-\-h — Ij ... 

can be represented thus 

a-l ^-1 

Vx = + ]£2/a+6-ia“>. 

&=0 2=0 

Hence, ya- ^ 0 for a; = 0, 1, 2, . . . a + 6 if none of the numbers 

2/0, 2/1; •• • 2/«-i 
2/a-{-6, 2/®-H>— 1; * • * ya+h—P+l 

is negative. This interesting property of the solutions of equation (4) 
derived almost intuitively from, the consideration of probabilities can be 
established directly. (See Prob. 9, page 160.) 

The lemma just proved yields almost immediately the following 
proposition: If for any two solutions t/' and y!J of equation (4) the 
inequality 

y'J ^ y'x 

holds for 

a: = 0, 1, 2, . . . a - 1; a + 6, a + 6 - 1, . . , a + b - ^ + 1, 

the same inequality will be true for all x = 0, 1, 2, ... a + h. It 
suffices to notice that yx = yx — 2 /® is a solution of the linear equation 
(4) and, by hypothesis, 2 /a; ^ 0 for a: = 0, 1, 2, . . . a: — 1; a + 5, 
d h — 1, . .. a“}“ & — 

Now we can come back to our problem. First, if the mathematical 
expectation of A 


— qa 
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is different from. 0, equation (7) has two positive roots: 1 and B. With 
arbitrary constants C and D 

2/' - C + De- 
is a solution of (4). Whatever C and D may be, as a function of x 
varies monotonically. Therefore, if C and D are determined by the 
conditions 

Vo ~ 1? 2/a+&-/3+l “ 0 

we shall have 

ylSl if a; = 0, 1, 2, ... a — 1 

^'^0 if a? = a + &~/3 + l, ...a + 6 

and by the above established lemma, taking into account conditions (5) 
and (6), we shall have for the required probability the following inequality 

Vx ^ yl; 

or, substituting the explicit expression for yl, 

Qa + b—^+l — 0 x 

yx ^ ^a+6-^+r __ I * 


If, on the contrary, C and D are determined by 

2/a-l = 1, y'a-^b = 0 

we shall have 


and 


2/^ ^ 1 if a: = 0, 1, 2, ... a — 1 

2/'^0 if x = a + h — p+lj.,.a + b 


Vx 


< 


0a+b—a+l 0x—ix+l 

0 a-i-b~~oc+X 2 


Finally, taking a; = a, we obtain the following limits for the initial 
probability ya: 


Sb-fi+i _ I 
^ 0a+b~^+l _ ^ 


^ ya S 


- 1 

0a+b-a+l __ I 


They give a sufficient approximation to ya if a and h are large com- 
pared with a and 

If each single game is equitable, equation (4) has a solution with two 
arbitrary constants: 

+ Dx, 

Proceeding in the same way as before, we obtain the inequalities 


1 L___. 

^ ~ + 1 a + b — a + 1 
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4. To simplify the analysis, it was supposed that nothing limited the 
number of games played by A and B so that an eternal game, although 
extremely improbable, was theoretically possible. We now turn to 
problems in which the number of games is limited. 

Problem 3. Players A and B agree to play not more than n games. 
The probabilities of winning a single game are p and g, respectively, and 
the stakes are equal. Taking these stakes as monetary units, the fortune 
of A is measured by the whole number a and that of B is infinite or at 
least so large that he cannot be ruined in n games. What is the proba- 
bility for A to be ruined in the course of n games? 

Solution. Let yx,t represent the probability for A to be ruined when 
his fortune is measured by the number x and he cannot play more than 
t games. The reasoning we have used several times shows that yx,t 
satisfies a partial equation in finite differences: 

(8) ytc,t = 4- qyx-ht-i^ 

Moreover, if A has no money left, his ruin is certain, which gives the 
condition 

(9) yoj = 1 if ^ ^ 0. 

On the other hand, if A still possesses money and cannot play any more, 
his ruin is impossible, so that 

(10) yx,o = 0 if X > 0. 

Conditions (9) and (10) together with equation (8) determine yx,t 
completely for all positive values of x and t. To find an explicit expres- 
sion for yx,t we shall use Lagrange^s method. Equation (8) has particular 
solutions of the form 

where a and /S satisfy the relation 

= pa^ + q- 

We can solve this equation either for ^ or for a which leads to two different 
expressions of yx,t^ Solving for we have infinitely many particular 
solutions 

a^{pa + qoT^y 

with an arbitrary a and we can seek to obtain the required solution in the 
form 

= 




{pa + qa '^yf{a)da 
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where f{<x) is supposed to be developable in Laurent ^s series on a certain 
circle c. To satisfy (10) we must have 

a^~^f{a)da = 0 for x = 1, 2, 3, . . . 

which shows that f(a) is regular within the circle c. To determine /(a) 
completely, we must have, according to (9) 

\ (pa + qa~^y^^^da =1 for ^ = 0, 1, 2, . . . . 

AttIJc a 

All these equations are equivalent to a single equation 



1 r fjo^da _ 1 

2TrijcOc — pea^ — qe 1 — e 

holding good for all sufficiently small e. The integrand has a single pole 
ao within c defined by 

ao — peal 5 '^ = 1 ^? 
and the corresponding residue is 


But this must be equal to 


g + yoi-l 


q - pa% 





or, substituting for 6 its expression in ao 

q + yal 

pal — ao + 

and hence for all sufficiently small ao 


that is, if 


/(«o) = 




pal — ao + g' 


/(«) = 


q - 

pa^ — a + g 


all the requirements are satisfied. Taking into account that p + g 
we have 


/(«) = 


1 , pa 

, 

1 — a q — pa 


1, 


/(») -1 + 2 

n — l 



and also 
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The expression for yx.t is therefore 

00 

yx,t = + qaT'^y^^^CnOp'da 

TO =0 

where Co = 1 and Cn = 1 + if n ^ 1. 

It remains to find the coefficient of l/a in the development of the 
integrand in a series of descending powers of a. Since 

t 

a^-\Va + qa-^y = 

1 = 0 

this coefficient is given by the sum 

t —X 
2 

1=0 

extended over all integers I from 0 up to the greatest integer not exceeding 

jjf — ^ 

— 2 — Hence, the final expression for the probability ya,n is 

n — a 
2 

( 11 ) ya.n = 

1 = 0 

with the agreement, in case of an even n — a, to replace the sum 

po _|- qo 

corresponding to I = — ^ — by 1- is natural that the right-hand 

member of the preceding expression should be replaced by 0 if n < a, 
which is in perfect agreement with the fact that A cannot be ruined in less 
than a games. 

The second form of solution is obtained if we express o: as a function of 
The equation 

'pa^ — a/? + g = 0 
having two roots, we shall take for a the root 

/3 — — 4:pq 

^ " 2p 

determined by the condition that it vanishes for infinitely large positive 
id and can be developed in power series of 1/p when \p\ > 2'\/^. Using 
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a in this perfectly determined sense, it is easy to verify that 


y HO, t 






/3 - 1 


dp 


— 4pg \ 

~2p } 

where c is a circle of radius > 1 described from 0 as its center, satisfies all 
the requirements. For it is a solution of equation (8). Next, for x = 0 
and ^ ^ 0, 


and, finally, for i = 0 and a; > 0 


+ 


w 


y 




p — — 4j9ffY dp 


2p 


Jp-1 


0 


because the development of the integrand into power series of 1/P 
starts at least with the second power of 1/p. 

To find yx,t iu explicit form, it remains to find the coefficient of 1/jg 
in the development of 

( fi - -iyq Y 

V 2p / ^ - 1 

in a series of descending powers of p. Let 

^/3 - ViS^ - 4pg V = k ^ h±L m . . . . 

'* ^a;+l "t* * • * , 


2p 

multiplying this series by 


V 


+ + 


4- 1 q_ _1 q-. 


P-I ^ 

we find that the coefficient of 1 / P in the product is 

Iz + Iz-^l + * * * + 

and hence 

y x,t “ ^z T" •••-}- 2!^ 

provided t ^ x, for otherwise yx,t = 0. The quadratic equation in a 
can be written in the form 

« = + Poc^) 

and the development of any power of its root vanishing for P = oo into 
power series oil/p can be obtained by application of Lagrange's series. 
We have 

«o . 

^ nl L dr-' Jf-o' 
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J^r dA-\q + {x + 2i- 1)! ^ , 

nl[ J^=,o i\{x + i)\ ^ ^ 

a n — X + 2i, and = 0 if n = rc + 2z + 1. Hence, 

I _ a:(a: + 2^ - 1)1 

^a:4-2t4-l ~ 0, 

and finally 

(12) ;,... - ^•[l + > + + 

4- . . . + + fc + 1) • • • (g + 2fc - 1) / Si: 

^ ^ 1 • 2 • • ■ & 


where h = — ^ — or A; = ^ according as n and a are of the 

same parity or not. 

5. The difference ya,n — ya,n-i gives the probability for the player A 
to be ruined at exactly the nth game and not before. Now, this differ- 
ence is 0 if n differs from a by an odd number, so that the probability of 
ruin at the (a + — l)st game is 0. That is almost evident because 

after every game the fortune of A is increased or diminished by 1 and 
therefore can be reduced to 0 only if the number of games played is of 
the same parity as a. If n == a + 2i, the difference ya,n — ya,n^i is 


a{a + i + 1) • • • (<^ + — 1) 

1 • 2 * 3 • • • ^ ^ 




Such, therefore, is the probability for A to be ruined at exactly the 
(a + 2^)th game. The remarkable simplicity of this expression obtained 
by means which are not quite elementary leads to a suspicion that it 
might also be obtained in a simple way. And, indeed, there is a simple 
way to arrive at this expression and thus to have a third, elementary, 
solution of Prob. 3. 

Considering the possible results of a series of a + 2i games, let A 
stand for a game won by A, and B for a game lost by A. The result of 
every series will thus be represented by a succession of letters A. and B , 
We are interested in finding all the sequences which ruin A at exactly 
the last game. Because the fortune of A sinks from a to 0 there must be 
i letters A and ^ + a letters B in every sequence we consider. Besides, 
there is another important condition. Let us imagine that the sequence 
is divided into two arbitrary parts, one containing the first letter and 
another the last letter of the sequence. Let x be the number of letters J5, 
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and y that of letters A in the second or right part of the sequence. There 
will be a + i — X letters B and i — y letters A in the first or left part. 
It means that the fortune of A after a game corresponding to the last 
letter in the left part, becomes 

and since A cannot be ruined before the (a + 20th game, x must always 
be >y. That is, counting letters A and B from the right end of the 
sequence, the number of letters B must surpass the number of letters A 
at every stage. Conversely, if this condition is satisfied the succession 
represents a series of games resulting in the ruin of A at the end of the 
series and not before. 

To find directly the number of sequences satisfying this requirement 
is not so easy, and it is much easier, following an ingenious method 
proposed by D. Andrd, to find the number of all the remaining sequences 
of i letters A and t + a letters B. These can be divided into two classes : 
those ending with A and those ending with B. Now, it is easy to show 
that there exists a one-to-one correspondence between successions of these 
two classes, so that both classes contain the same number of sequences. 
For, in a sequence of the second class (ending with B) starting from 
the right end, we necessarily find a shortest group of letters containing 
A and B in equal numbers. This group must end with A. Writing 
letters of this group in reverse order without changing the preceding 
letters, we obtain a sequence of the first class ending with A. Con- 
versely, in a sequence of the first class there is a shortest group at the 
right end ending with B and containing an equal number of letters A and 
B. Writing letters of this group in reverse order, we obtain a sequence 
of the second class. 

An example will illustrate the described manner of establishing the 
one-to-one correspondence between sequences of the first and of the 
second class. Consider a sequence of the first kind 

B\BBABAA, 

The vertical bar separates the shortest group from the right containing 
letters A and B in equal numbers. Reversing the order of letters in this 
group, we obtain a sequence of the second class 

B\AABABB 

and this sequence, by application of the above rule, is transformed again 
into the original sequence of the first class. The number of sequences 
of the first class can now be easily found. It is the same as the number of 
all possible sequences of i — 1 letters A and a + i letters B, that is, 

(a 4- 2i — 1) ! _ (a -f- i + l)(<x -f- i + 2) * * * (a -1- 2^ — 1) 

(i- l)!(a + t)! 1-2 — • a - 1) 
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The total number of sequences in both classes is 

^(a + i + 1) {a + i + 2) • • • (a + 2i — 1) 

1 • 2 • • • (t - 1) 

Hence, the number of sequences leading to ruin of A in exactly a + 2i 
games is 

((X 4“ 4“ 1)(^ “jJ "4" 2) * • • (n -f" 2i) 

1 • 2 • • • X 

_ o(^ + X + l)(a + i + 2) • * * (a 4" 2x — 1) __ 

1 • 2 • • • (x - 1) 

ci((x 4“ ^ 4" 1) * ■ * (<x 4" 2x — 1) 

_ j . 2 . . . i 


As the probability of gains and losses indicated by every such sequence 
is the same, namely, the probability of the ruin of A in exactly 

a 4* 2i games is 

aja + i + 1) ■ ■ ■ (a + 2i - 1) ^ ^ 

1 ■ 2 • 3 • • • i ® ^ 

and hence the second expression found for ya,n follows immediately. 

The problem concerning the probability of ruin in the course of a 
prescribed number of games for a player playing against an infinitely 
rich adversary was first considered by de Moivre, who gave both the 
preceding solutions without proof; it was later solved completely by 
Lagrange and Laplace. The elementary treatment can be found in 
Bertrand^s '^Calcul des probabilites.’^ 

6. Formulas (11) and (12), though elegant and useful when n is not 
large, become impracticable when n is somewhat large, and that is pre- 
cisely the most interesting case. Since the question of the risk of ruin 
incurred in playing equitable games possesses special interest, it would not 
be out of place at least to indicate here, though without proof, a con- 
venient approximate expression for the probability 2 / 0 , n in case of a large 
n and p = ^ = 3^. Let t be defined by 


vwny 

then for n ^ 50 it is possible to establish the approximate formula 


ya,n 


2 n 

VttJo 


+ 


fin 


where —1 < ^ < 1. Suppose, for instance, that the fortune of a player 
amounts to $100, each stake being $1, and he decides to play 1,000^ 
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5.000, 10,000, 100,000, 1,000,000 games. Corresponding to these eases, 
we find 

t = 2.2354, 0.9999, 0.7071, 0.2236, 0.0707 

and hence 

Ce-^^dz = 0.9984, 0.8427, 0.6827, 0.2482, 0.0796. 

VTrJo 

The corresponding approximate values of 2 / 100 , n are 

0.0016, 0.1573, 0.3173, 0.7518, 0.9204. 

Thus, for a player possessing $100 there is very little risk of being ruined 
in the course of 1,000 games even if he stakes $1 at each game. The risk 
is considerably larger, but still fairly small, when 5,000 games are played. 
In 10,000 games we can bet 2 to 1 that the player will still be able to 
continue. But when the limit set for the number of games becomes 

100.000, we can bet 3 to 1 that the player will be ruined somewhere in the 
course of those 100,000 games. Finally, there is little chance to escape 
ruin in a series of 1,000,000 games. The risk of ruin naturally increases 
with the number of games, but not so fast as might appear at first sight. 

7. We conclude this chapter by solving the following problem, 
where the fortunes of both players are finite. 

Problem 4. Players A and B agree to play not more than n games, 
the probabilities of winning a single game being p and g, respectively. 
Assuming that the fortunes of A and B amount to a and h single stakes 
which are equal for both, find the probability for A to be ruined in the 
course of n games. 

Solution. Let Zx,t be the probability for the player A to be ruined 
when his fortune is x (and that of his adversary a + 6 — a;) and he can 
play only t games. Evidently Zx,t satisfies the equation 

(13) Zx,t = pzx+u-^i + qzx~.i,t-i 

perfectly sindlar to equation (8), but the complementary conditions 
serving to determine Zx,t completely are different. First we have 

(14) Zo,t = 1 for ^ ^ 0. 

Next, 

(15) = 0 for ^ ^ 0, 

because if A gets all the money from B, the games stop and A cannot be 
ruined. Finally, 

(16) ^.,0 = 0 for a; - 1,2,3, . . . a + b - 1. 
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because A, having money left at the end of play, naturally cannot be 
ruined. 

Since (13) has two series of particular solutions 

and a'^13^ 

where a and a' are roots of the equation 

pa^ — jSa + g = 0 

both developable into series of descending powers of /3 for |/3| > 1, we 
shall seek z^,t in the form 


Here the integration is made along a circle of auflS.ciently large radius and 
f(d) and (p{S) are two unknown functions which can be developed into 
series of descending powers of p. Obviously Za:,t satisfies (13) identically 
in X and t. For a; = 0 and t 0 we have the condition 

+ ‘Pmm = i; ^ = o, i, 2, . . . 

which is satisfied if 

(17) m + m 


Condition (15) will be satisfied if 

(18) a"+y(0) + = 0 

and it remains to show that at the same time (16) is satisfied. Solving 
(17) and (18), we have 

O^fa+b 3 ^ 

= a'o+i - a^+b ■ 

— 1 

a'a+i. _ 

and 


(19) 


a'a+bQtX _ oja+fta's: 

(|8 - l)(a'“+'' - «“+*■) 



^'a+b—x 

(^ - l)(a:'«+^ - 


Now let a be the root vanishing for /3 = 00 and a' the other root whose 
development in series of descending powers of starts with the term 
containing 0. Evidently the development of (19) for 


a; = 1, 2, 3, . . . a + f> -- 1 
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does not contain terms involving the first power of l/j^, and hence 
= 0 if o: = 1, 2, 3, . . . a + 6 — 1 as it should be. The solution 
of (13) satisfying (14), (15), (16) being unique, its analytical expression is 
therefore 


t 



/a+b—a 


^a-hb-^cn 

-^S+F- 


l3‘d^ 

0 - f 


whence for a; = o and t = n 


^a,n 



^ — r 


To find an explicit expression for Za,n it remains to find the coefficient of 
1//? in the development of 

A Y 

\p ) ~ /5 -- 1 

in series of descending powers of /5. This can be done in two different 
ways. First we can substitute for a' its expression in a: 


oc = —a ^ 

V 


and present P in the form 


a+b 


y2a+2b 


0” 

/3- 1’ 


P = — 

1 - 

or developing into series 

P = q:“+ 2‘ + ^ a3»+2» _ ^2^ "^^^0,30+45 

But the coefficient of l/jS in 




1 


P - 1 

by the second solution of Prob. 3 is the probability ym,n for a player with 
a fortune m to be ruined by an infinitely rich player in the course of n 
games. Hence, the final expression for Za,n is 


^a,n — ya,n 


q/ + \l 


0+6 


2 / 30 + 26 , n 


0+26 


2/30+46, n + 
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the terms of this series being alternately of the form 

ka+kh 

y (2A;+1) a4-2ifc6, n 


and 


Aa+ (&+!)& 


^(2A;+l)a+(2/i:+2)6,n 


for fc = 0, 1, 2, . . . . The series stops by itself as soon as the first 
subscript of yx,n becomes greater than n. 

To obtain a second expression of Za,n we notice that 




Q,' ^ a' — a 

is a rational function of ^ whose denominator 

R = 




= Q ^ B 


is a polynomial in jS of the degree a + h — 1. To find the roots of JS = 0, 
we set P = 2\/^ cos cp. Since, then, 


we have 


The equation 


having roots 


_ f g\ °'^ 9 : ~ ' ^ sin (a + b)(p 


R 


sin <p 


sin (a + h)<p 


= 0 


hir 


<Ph 


sin (p 

h = Ij 2j ... a + h — Ij 


a + y 

the a + 6 — 1 roots of R are 

Pk = 2v^ cos (ph. 

Now we can resolve the rational function P into a sum of simple elements 
as follows; 


P = E(^) + 


o-}-& — 1 

Ah 




A = 1 


A = - g’’) 

® <pa+b qa-\-h 


where 
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and for h > 0 

A, - ^ coa ^.) 

while E{I3) is the integral part of P. The coefBcient of 1/^ in the develop- 
ment of P being 

a+6-l 

Ao + Ajii 

we have a new explicit expression for Za,n- 


( 20 ) 


/p(i'\‘h ^a+6 


aa + h —1 

(2V^)"+Hgp~^)^ _ 

a + b - 

A-i 1 


sin 


irh 

<2 “h ?> 


2Vm 


cos 


Tth 

a + h 


■ sin 


Tah 
u b 


cos 


Tvh Y 
a + 6/ 


This expression shows clearly that Za,n, with increasing n, approaches 
the limit 

qa^ph _ qb\ 


representing the probability of ruin when the number of games is unlim- 
ited, in complete accord with the solution of Prob. 1. 

The first term in (20) naturally must be replaced by — in case 

a -T 0 

P — == M* This form of solution was given first by Lagrange. 


Problems for Solution 

1. Players A and B with fortunes of 150 and $100, respectively, agree to play until 

one of them is ruined. The probabilities of winning a single game are % and 
respectively, for A and B, and they stake $1 at each game. What is the probability 
of ruin for the player A? Ans, Very nearly 2"^^ ~ 8.88-10”^®. 

2. If A and B at each single game stake $3 and $2, respectively, and have fortunes 
of $30 and $20 at the beginning, what is the approximate value of the probability 
that JL will be ruined if the probability of his winning a single game is (a) p — %] 

Ans. (a) 0.40 + A; |a| < 1.7 X lO"®; (6) 0.96 + A; [Al < 4.6 X lO-^. 

3. A player A with the fortune $a plays an unlimited number of games against an 
infinitely rich adversary with the probability p of winning a single game. He stakes 
$1 at each game, while his rich adversary risks staking such a sum jS as to make the 


FURTHER CONSIDERATIONS ON GAMES OF CHANCE 159 


game favorable to A. What is the probability that A will be ruined in the course 
of the games? Give numerical results if (a) a = 10, 2? = /? = 3,* (b) a = 100, 

P = Mj A'tis. Let ^ < 1 be a positive root of the equation •— 0 + $ •= 0. 

The required probability P is : P = 

In case (a) P = 0.002257; in case (5) P = 3.43 * IO-^t. 

4. A player A whose fortune is $10 agrees to play not more than 20 games against 

an infinitely rich adversary, both staking $1 with an equal probability of winning a 
single game. What is the probability that A will not be ruined in the course of 
20 games? Ans. 0.9734. 

5. Players A and B with $1 and $2, respectively, agree to play not more than n 
equitable games, staking $1 at each game. What are the probabilities of their ruin? 


Ans. For A: - 


3 + (-1^ 1 _ 3 -- (-1)^ 

3 . 2«+i ^ *3 3 . 2^"^^ 


6. Players A and B with $2 and S3, respectively, play a series of equitable games, 
both staking $1 at each game. What are the probabilities of their ruin in n games? 
Give the numerical result if n = 20. Ans. 


For A: 


For B: 


5 5lV 4 y 

2 ^// Vs+i V 


■nv 


7 




6 = 1 if n is odd, € == 2 if n is even. 


17 — 1 if n is even, 


1 


= 2 if n is odd. 


7. Find the expression of ?/a.n, the probability of the ruin of A when his adversary 
B is infinitely rich, corresponding to formula (20). Ans. From the definition of a 
definite integral it follows that 


2 / 0 , « — ya-fi 




f- 

Jo 1 - 


sm <p sin a<p 


2Vm 


cos (p 


-(cos (p)^d<p 


where 


ya,„ = 1 if P £ y 


2/0.00 = I - 

\p 


if V > g.’ 

If the games are equitable and n differs from a by an even number, then 


ya,n 


T 


2 P2 fill o,<P 
tJo sin <p 


(cos <p) ^^dip. 


This formula was given by Laplace. 

8. Referring to the last formula in the preceding problem, show that 


2/a,n 


= 1 - 



-b A 


t 


a 

\/2(n + I) 


|A| 


1 . 2 

< v h -e 

27rn n 


32 . 


where 
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Indication of the Proof. It is important to prove the following inequalities first 


whence 


<p (cos 


W'f'l 


sin ^ 

^(cos 


< e 


for 


0 < ^ ^ 


sin <p 


> e 


<p (cos 
sin <p 


n+l 


n + § 


(n+l)y* 

8 


for 


-f- 1 


0 < ^ 


0 < ^ < 1 


provided 0 < ^ ^ 7r/4. The rest of the proof is easy. 

9. Attempt a direct proof of the important lemma (page 144) used in the discus- 
sion of Prob. 2. 

Hint: The proof can be based upon the following proposition^ generalizing an 
important theorem on determinants due to Minkowski: Let 


fi = aiiXi + a 2 iX 2 4- • . ‘ -h aniXn; i == 1, 2, 3, . . , n 


be a system of linear forms whose coefficients satisfy the following conditions: 

(1) an > 0; aki ^ 0 if k i; aii -h a2i + • • • + cird ^ 0. 

(2) One of these sums is positive. 

If these forms assume nonnegative values, then every rci ^ 0(f = 1, 2, ,. . . w). 
Proof by induction: Express Xn through xi, xa, . . . Xn-i, thus: 


Xn 


fn CllnXi a^nXi 

ann 


— an^l,nXn—l 


and substitute into the remaining forms. Show that the resulting forms in 0 : 1 , Xz^ 
. . . ;Cn-i satisfy the same conditions (1) and (2). Hence, it remains to prove the 
proposition for two forms, which can easily be done. 
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CHAPTER IX 


MATHEMATICAL EXPECTATION 

1. Bernoulli's theorem, important though it is, is but the first link 
in a chain of theorems of the same character, all contained in an extremely 
general proposition with which we shall deal in the next chapter. But 
before proceeding to this task, it is necessary to extend the definition of 
“mathematical expectation^’ — an important concept originating in 
connection with games of chance. 

If, according to the conditions of the game, the player can win a 
sum a with probability p, and lose a sum h with probability ^ = 1 — p, 
the mathematical expectation of his gain is by definition 

pa — qb. 

Considering the loss as a negative gain, we may say that the gain of the 
player may have only two values, a and —6, with the corresponding 
probabilities p and g, so that the expectation of his gain is the sum of the 
products of two possible values of the gain by their probabilities. In this 
case, the gain appears as a variable quantity possessing two values. 

Variable quantities with a definite range of values each one of which, 
depending on chance, can be attained with a definite probability, are 
called “chance variables,” or, using a Greek term, “stochastic” variables. 
They play an important part in the theory of probability. A stochastic 
variable is defined (a) if the set of its possible values is give% and (6) if 
the probability to attain each particular value is also given. 

It is easy to give examples of stochastic variables. The gain in a 
game of chance is a stochastic variable with two values. The number of 
points on a die that is tossed, is a stochastic variable with six values, 
1, 2, . . . 6, each of which has the same probability A number on 
a ticket drawn from an urn containing 20 tickets numbered from 1 to 20, 
is a stochastic variable with 20 values, and the probability to attain 
any one of them is Each of two urns contains 2 white and 2 black 
balls. Simultaneously, one ball is transferred from the first urn into the 
second, while one ball from the latter is transferred into the first. After 
this exchange, the number of white balls in one of the urns may be regarded 
as a stochastic variable with three values, 1, 2, 3, whose corresponding 
probabilities are, respectively, It is natural to extend the 

concept of mathematical expectation to stochastic variables in general. - 

161 
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Suppose that a stochastic variable x possesses n values: 

^1) ^ 2 ) • • • ^nj 

and 

Pl, P2, . . . pn 

denote the respective probabilities for x to assume values Xi, X 2 , . . . x«. 
By definition the mathematical expectation of x is 

JE(x) = PiXi + P2X2 + • • • + PnXn- 

^t is understood in this definition that the possible values of the 
variable x are numerically different. For instance, if the variable is a 
number of points on a die, its numerically different values are 1, 2, 3, 4, 5, 
6, each having the same probability, By definition, the mathematical 
expectation of the number of points on a die is 

, K1 + 2 + 3 + 4 + 5 + 6) = 3.5. 

j If the variable is the number on a ticket drawn from an urn containing 
20 tickets numbered from 1 to 20, its numerically different values are 
represented by numbers from 1 to 20, and the probability of each of 
these values is so that the mathematical expectation of the number 
on a ticket is 

■5^(1 + 2+ • * • + 20) = 10'.5. 

2. It is obvious that the computation of mathematical expectation 
requires only the knowledge of the numerically different values of the 
variables with their respective probabilities. But in some cases this 
computation is greatly simplified by extending the definition of mathe- 
matical expectation. Suppose that, corresponding to mutually exclusive 
and exhaustive cases ^ 1 , + 2 ? • • • the variable x assumes the values 
Xi, X 2 , . . . Xmj with the corresponding probabilities pi, p 2 , • . • Pm) 
we can define the mathematical expectation of x by 

E{x) = PiXi + P2X2 + • ■ * + PmX^. 

What distinguishes this extended definition from the original one is that 
in the second definition the values xi, X 2 , . . . x„i need not be numerically 
different; the only condition is that they are determined by mutually 
exclusive and exhaustive cases. 

+ To make this distinction clear, suppose that the variable x is the 
number of points on two dice. Numerically different values of this 
variable are 

2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 
and their respective probabilities 

A? A- 
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Therefore, by original definition, the expectation of x is 

A + A + if + M + M + if + if + If + If + If + If = W = 7. 

But we can distinguish 36 exhaustive and mutually exclusive cases accord- 
ing to the number of points on each die and, correspondingly, 36 values 
of the variable rr, as shown in the following table: 


First die 

Second die 

X 

First die 

Second die 

X 

1 

1 

2 

4 

1 

5 

1 

2 

3 

4 

2 

6 

1 

3 

4 

4 

3 

7 

1 

4 

5 

4 

4 

8 

1 

5 

6 

4 

5 

9 

1 

6 

7 

4 

6 

10 

2 

1 

3 

5 

1 

6 

2 

2 

4 

5 

2 

7 

2 

3 ‘ 

5 

5 

i 3 

8 . 

2 

4 

6 

5 

4 

9 

2 

5 

7 

5 

5 

10 

2 . 

6 

8 

5 

6 

11 

3 

1 

4 

6 

1 

7 

3 

2 

5 

6 

2 

8 

3 

3 

6 

6 

3 

9 

3 

4 

7 

6 

4 

10 

3 

5 

8 

6 

5 

11 

3 

i 6 

9 

6 

6 

12 


The probability of each of these 36 cases being Hg, by the extended 
definition the mathematical expectation of x is 


2 + 2*3 + 4*3 + 5*4 + 6-5 + 7*6 + 8-5 + 9'44-10-3 + ll-2+12 

36 


= 7 


as it should be. 

It is important to show that both definitions always give the same 
value for the mathematical expectation. 

Let Xiy X 2 j . . . Xm be the values of the variable x corresponding 
to mutually exclusive and exhaustive cases Ai, A 2 , . . . 'and, 
Ph Vh • • • their respective probabilities.* By the extended defini- 
tion of mathematical expectation,, we have 


( 1 ) 


E{X) = p$Xt + P2X2 + ' • * + PmXm- 
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The values Xi, X2, . . . x^ are not necessarily numerically different, 
the numerically different values being 

r, . . . X. 

We can suppose that the notation is chosen in such a way that 

Xi, X2f , . . Xa are equal to 

Xa+h • . • Xh are equal to t?; 

Xs+i, XhJr2j . > . Xc are equal to f ; 


^ ‘ Xm are equal to X. 

Hence, the right-hand member of (1) can be represented thus: 

(Pl + P2 + * • * +Va)i + {Pa^l + pa+2 +*••-[- + * * * + 

+ (pz+1 “h Vw -[t * ‘ ‘ + Pw)X. 

But by the theorem of total probabilities, the sum 

Pi + P2 + * * • + Pa 

represents the probability P for the variable x to assume a determined 
value ?, because this can happen in a mutually exclusive ways; namely, 
when X — Xi^ ov X ^ X2j , . . or a: = x^. By a similar argument we see 
that the sums 

Pa+l + Pa+2 + ■ ' ‘ + P6 
P64-1 + P6+2 + * * ' + Pc 

pl^l + -b • • ' + Pm 

represent the probabilities Q, . . . T for the variable x to assume 
values 77, ... X. Therefore, the right-hand member of (1) reduces 

to the sum 

+ Qy) ^ Rt + ' - + T\ . 

which, by the original definition, is the mathematical expectation of x. 

If, corresponding to mutually exclusive and exhaustive cases, a 
variable x assumes the same value a — ^in other words, remains constant — 
it is almost evident that its mathematical expectation is a, because the 
sum of the probabilities of mutually exclusive and exhaustive cases is 1. 
It is also evident that the expectation of ax where a is a constant, is 
equal to a times the expectation of a;. 

Note; Very often the matheinatical expectation of a stochastic variable is called 
its ‘'mean value.” 

Mathematical Expectation op a Sum 
3. In many cases the computation of mathematical expectation is 
greatly facilitated by means of the following very general theorem: 



Sec. 3] 


MATHEMATICAL EXPECTATION 


166 


Theorem. The mathematical expectation of the sum of several variables 
is equal to the sum of their expectations; or^ in symbols, 

E(x + ‘ +^)= E{x) + E{y) + E{z) E{w). 

Proof. We shall prove this theorem first in the case of a sum of two 
variables. Let x assume numerically different values Xi, X 2 , . r. Xm, 
while numerically different values of y are yi, y^, . . . Vn- In regard to 
the sum x + y we can distinguish mn mutually exclusive cases; namely, 
when X assumes a definite value Xi and y another defiinite value yj, while i 
and j range respectively over numbers 1, 2, 3, . . . wandl, 2, 3, . . . n. 
If Pa denotes the probability of coexistence of the equalities 

2/ == 2/? 

we have by the extended definition of mathematical expectation 

m n 

E{x + y) = ^ '^Puixi + yi), 


(2) E{x 4- 2 /) = 22 VijXi + .X? 

t = iy=i -i=iy=i 

As the variable x assumes a definite value Xi in n mutually exclusive 
ways (namely, when the value Xi of x is accompanied by the values 
Viy 2 / 2 , • 2/n of y) it is obvious that the sum 

n 

Xvii 

y=i 

represents the probability pi of the equality x - Xi, In a similar manner 
we see that the sum 

m 

i==l 

represents the probability gy of the equality y = t/y. Therefore 

m n m n m 

2/ = E{x), 

i s= 1 y = 1 i ss 1 y = 1 i = i 


m n n m n 

2 ) XPaVi = 'X XviiVi = = E{y); 

i*siy=i y=iz = i y=i 


and sinGularly 
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that is, by (2) 

E(x + y) = E{x) + E{y) 

which proves the theorem for the sum of two variables. 

If we deal with the sum of three variables a; + j/ + s, we may consider 
it at first as the sum oi x y and z and, applying the foregoing result, 
we get 

E{x + 2 / -f- 2 ) = E{x + y) + E{z)] 

and again, by substituting (re) + E{y) for E{x + y), 

* 

E{x + y z) = E(x) + E{y) + E{z). 

In a similar way w^e may proceed farther and prove the theorem for the 
sum of any number of variables. 

4. The theorem concerning mathematical expectation of sums, 
simple though it is, is of fundamental importance on account of its very 
general nature and will be used frequently. At present, we shall use it 
in the solution of a few selected problems. 

Problem 1. What is the mathematical expectation of the sum of 
points on n dice? 

Solution. Denoting by Xi the number of points on the ^'th die, the 
sum of the points on n dice will be 

s = q- q- • • • Xn, 

and by the preceding theorem 

E{s) = E(x{) + E(x 2 ) + • • * + E{xn). 

But for every single die 

E(xi) = f = 1, 2, . . . n; 

therefore 



Problem 2. What is the mathematical expectation of the number of 
successes in n trials with constant probability j?? 

Solution. Suppose that we attach to every trial a variable which 
has the value 1 in case of a success and the value 0 in case of failure. If 
the variables attached to trials 1, 2, 3, . . . n are denoted by 0 : 2 , . . . 
OTn, their sum 

m = Xi-\- + ^ Xn 

obviously gives the number of successes in n trials. Therefore, the 
required expectation is 

E{m) == E{x-^ + E{x^ _j_ . . . ]E(^Xn). 
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But for every i = 1, 2, 3, . . . n 


= p • 1 + (1 ~ p) ‘ 0 == p, 


because Xi may have values 1 and 0 with the probabilities p and 1 — p 
which are the same as the probabilities of a success or a failure in the rth 
trial. Hence, 

E{m) — np 


or 


E(m — np) = 0, 


which may also be written in the form 


^ Tm{m — np) = 0. 

m~0 

This result was obtained on page 116 in a totally different and more 
complicated way. The new deduction is preferable in that it is more 
elementary and can easily be extended to more complicated cases, as 
we shall see in the next problem. 

Problem 3. Suppose that we have a series of n trials independent or 
not, the probability of an event being pi in the ^^th trial when nothing is 
known about the results of other trials. What is the mathematical 
expectation of the number of successes m in n trials? 

Solution. Again let us introduce the variable Xi connected with 
the “fth trial in such a way that Xi = 1 when the trial results in a success 
and Xi = 0 when it results in failure. Obviously, 

m = Xi + X2 + * ' * + Xn 
and 

E(m) = E(xi) + E{x2) + • • • + E(xn)- 

But 

E(xi) = 1 • Pi + 0 • (1 - Pi) = Pi 

and therefore 

E{m) = Pi + P2 + ' • * + Pt^. 

For instance, if we have 5 urns containing 1 white, 9 black; 2 white, 
8 black; 3 white, 7 black; 4 white, 6 black; 5 white, 5 black balls, and we 
draw one ball out of every urn, the mathematical expectation of the 
number of white balls taken will be: 

F?(m) = * + ^ + A + A + ^ 

Problem 4. An urn contains a white and h black balls, and c balls are 
drawn. What is the mathematical expectation of the number of the 
white balls drawn? 
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Solution. To every ball taken we attach a variable which has the 
value 1 if the extracted ball is white, and the value 0 otherwise. The 
number of white balls drawn will then be 

s = a^i + rc2 + ‘ * * + 

But the probability that the ^th ball removed will be white when nothing 
is known of the other balls is — therefore 

“T V 


E{x,) = 


a + & 


• 1 + 




0 = 


a + 6 


for every i, and the required expectation is 


E{s) = 


ca 

a -H 5 


Problem 6. An urn contains n tickets numbered from 1 to n, and 
m tickets are drawn at a time. What is the mathematical expectation 
of the sum of numbers on the tickets drawn? 

Solution. Suppose that m tickets drawn from the urn are disposed 
in a certain order, and a variable is attached to every ticket expressing 
its number. Denoting the variable attached to the ith ticket by Xi^ 
the sum of the numbers on all m tickets apparently is 


5 = Xi + 3^2 + * • * + Xm- 


But when taken singly, the variable Xi may represent any of the numbers 
1, 2, 3, . . . n, the probability of its being equal to any one of these 
numbers being 1/n. By the definition of mathematical expectation, we 
have 

n + 1 
2 ' 


For example, taking the French lottery where n = 90 and m == 5, we 
find for the mathematical expectation of the sum of numbers on all 5 
tickets 

Ei-s) = 227.5. 

Problem 6. An urn contains n tickets numbered from 1 to n. These 
tickets are drawn one by one, sp that a certain number appears in the 
first place, another number in the second place, and so on. We shall say 


E(xi) = 


1 + 2 + 3 + 


+ n 


n 


and therefore 


E(s) = 


m(n + 1) 
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that there is a coincidence^^ when the number on a ticket corresponds 
to the place it occupies. For instance, there is a coincidence when the 
first ticket has number 1 or the second ticket has number 2, etc. Find 
the mathematical expectation of the number of coincidences. Also, find 
the probability that there will be none, or one, or two, etc., coincidences. 

Solution. Let Xi denote a variable which has the value 1 if there is 
coincidence in the rth place, otherwise Xi = 0. The sum 

s = + • * ' + 

gives the total number of coincidences and 

E(s) — E(xi) + E{x^ E{xn)- 

But 



because the probability of drawing a ticket with the number i in the itli 
place without any regard to other tickets obviously is 1/n; therefore, 

Eis) = n • - = 1. 
n 

On the other hand, denoting the probability of exactly i coincidences by 
Pi, we have by definition 

= Pi + 2p2 + ' ' * + npnj 

and, comparing with the preceding result, we obtain 

(3) pi + 2p2 4- ' ’ ‘ + npn = 1. 

Let us denote by <p{n) the probability that in drawing n tickets, we shall 
have no coincidences. It is easy to express pi by means of (p{n — i). 
In fact, we have exactly i coincidences in 

_ n{n - 1) • * • (n - i + 1) 

^ 1 • 2 • 3 • • * ^ 

mutually exclusive cases; namely, when the tickets of one of the 

specified groups of i tickets have numbers corresponding to their places 
while the remaining n — i tickets do not present coincidences at all. 
By the theorem of compound probability, the probability of i coincidences 
in i specified places is 


1 ^ 1 . , , 1 

n n — 1 


n — i + 1 
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and the probability of the absence of coincidences in the remaining n — i 
places is (pin — i). The probability of exactly i coincidences in i specified 
places is therefore 

(pin — i) 

nin — ' 1) • • * (n — i + 1)^ 

and the total probability pi of exactly i coincidences without specification 
of places is 

__ njn — 1) • • • (n — ^ + 1) , <^(?^ ~ i) 

1 • 2 • 3 • ’ * ^ n{n — 1) • • • (n — ^ + 1)^ 


or 

(4) 


Vi 


(pin — i) 

1 • 2 • 3 • • 


The symbol ^(0) has no meaning, but the preceding formula holds 
good even for i = n if we assume ^(0) = 1. 

Substituting expression (4) for pi into (3), we reach the relation 


_ 1 ) + + 


+ 


«:(0) 

1 )! 


= 1 ; 


or changing n into n + 1 


<p{n) + 


<p{n 


1 ! 


1) I y(n-2) ^ 


2! 




which gives successively ^(2), ^(3), ... by taking 

n = 1, 2, 3, . . . . 

The general result, which can easily be verified, is 


= 2 


(-1)* 

h\ 


&=0 


or, in an explicit form, 
(pin) 


i_i + j_ 

1 ^ 1-2 


+ 


1 •2-3 

Even for moderate n this is very near to 

1 


+ 


(- 1 )” 


1-2-3 


1 = 1 — 1 -i — n 

e 1^1-2 


n 


+ 


ad inf. = 0.36787944. 


1-2-3 

Mathematical Expectation of a Product 
6. For the product of two or more stochastic variables we do not 
possess anything so general as the foregoing theorem concerning the 
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mathematical expectation of sums. An analogous theorem with respect 
to the product of stochastic variables can be established only under 
certain restrictive conditions. 

Several stochastic variables are called ' independent^ ^ if the proba- 
bility for any one of them to assume a determined value does not depend 
on the values assumed by the remaining variables. For instance, if the 
variables are the numbers of points on dice, they may be considered as 
independent. 

On the other hand, w^e have a case of dependent variables in numbers 
on tickets drawn in a lottery. For, in this case the fact that certain 
tickets have determined numbers precludes the possibility of any one of 
these numbers appearing on other tickets drawn at the same time. 

If more than two variables are independent according to the above 
definition, it is clear that any two of them are independent. But the 
converse is not true: It is easy to imagine cases when any two of the 
variables are independent and yet they are not independent when taken 
in their totality. Therefore, when speaking of independence of variables, 
we must alw^ays specify whether they are independent in their totality 
or only in pairs. 

For two independent variables we have the following simple theorem: 

Theorem. The mathematical expectation of the product xy of two 
independent variahles x and y is equal to the product of their expectations; 
or, in symbols 

E(xy) = E{x)E{y). 

Proof. Let xi^ X2, ... Xm he the complete set of values for x, and 
Vh y% • • ‘ Vn the analogous set for y. Denoting the probability of 
X being equal to Xi by pi, and similarly, the probability of y being equal 
to yj by g/, the events 

X = Xi and y = yj 

are independent by definition of independence — because the probability 
of X being equal to Xi is not affected by the fact that y has assumed any 
one of its possible values, and it remains pi. 

By the theorem of compound probability the simultaneous occurrence 
of the events 

X ^ Xi and y = yj 

has the probability pig/. Again, by the extended definition of mathe- 
matical expectation 

m n 

Eixy) = '^‘PiqiXiyj 
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because the values of the product xy are determined by mn exhaustive 
and mutually exclusive cases 


a; = y = Vi 

i = 1, 2, . . . m; j = 1, 2, . . . n. 

Now, performing the summation with respect to j first, while i remains 
constant, we have 

n n 

'^PiqiXiUi = PiXi = piXiE(y), 

and again 

m m 

E(xy) = '^piXiEiy) = E(y)'^piXi, 

i=l 1=1 

or 

E{xy) = E{x)E{y), 

This theorem can be extended to the case of several factors inde- 
'pendent in their totality. For instance, if x, y, z are independent, it is 
obvious that xy and z are also independent. Hence 

E{xyz) = E{xy)E(z), 

and again 

E(xyz) = E{x)E{y)E{z). 

In a similar way we can extend this theorem to any number of inde- 
pendent factors. 

As an important application, let us consider two independent variables 
rr and y with the respective expectations a and h. The variables x — a 
and 2/ — & being independent also, we have 


but 

therefore 

( 5 ) 


E(x — a){y — h) = E{x — a)E{y — 6); 
E{x ~ a) = E{x) — a = a — a = 0; 


E{x — a){y — 6) = 0. 


Dispersion and Standard Deviation 
6. Let X be a variable and a its mathematical expectation. The 
expectation of 

(x — ay 

is called ^^dispersion^^ of the variable, and the square root of dispersion 
is usually called standard deviation.^’ As 

(x — ay _ 2ax + 
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we can apply the theorem on the expectation of sums to the right-hand 
member of this identity and find 

E{x — ay = E{x‘^) — 2 aE(x) + = E{x^) — 

or, denoting by b the expectation of x^y 

(6) Eix - ay = h - a\ 

Thus, the computation of dispersion can be reduced to the computa- 
tion of the expectation of the variable itself and its square. Also, denot- 
ing by (T the standard deviation of x, we have the formula 

cr^ = b — a^. 


For instance, if the variable is the number of points on a die, we have 




b = 


V + 2 ^ + 


6 


+ 62 ^ 91 
6 


and 


<;.2 = ^ = 2.917; (T = 1.708. 


Dispeesion of Sums 

7 . It is important to have a convenient formula to find the dispersion 
of a sum 

s = + ^2 + • * ' Xn 

of several stochastic variables. The expectation of 5 is given by 

E{s) = E{xi) + E{x2) -}-•••+ E{Xn) 
or 

E{s) = ai -f a2 + • • * + an, 

denoting by ai the expectation of Xi. The deviation of s from its expecta- 
tion is, therefore, 

Xi + X2+ * • • + Xn — (ai + a2 + * • * + Un), 


and we have to find the expectation of 


{Xi + X2+ • • ^ -f- iCn — ai — a2 - • • • ““ a^y. 
Now we. have identically 


n 

{Xi + X 2 + ‘ ^ ‘ + Xn — ai — ‘ — any = ^{xi - ai)^ + 

i=i 

”f" 2^1^ — a^(Xj a,), 
ij 
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the last sum being extended over all the different combinations of sub- 
scripts i and j for which i 9 ^ j and consisting of n(n — l)/2 terms. 
The mathematical expectation of a sum being equal to the sum of the 
expectations of its terms, we must find the expectations of the terms 

(xi — aiy and (xi — a^{xj — a/). 

The first is the dispersion of Xi and can be found from (6) ; namely, 

E{xi — a^y = bi — af = af 

if bi is the expectation of xf. 

As to 

E(xi — ai){xj - aj), 

instead of it we introduce the so-called ^^correlation coefficient'^ of Xi 
and Xf 

„ E(xi — ai)(xj — a^) 
xCi^j — * 

O' i<T 2 

Denoting the required dispersion by D, we obtain 

(7) D = o-f -h O'! ' • + crj + 2121,20- 10-2 •+ 2Ri,z<ti(Jz -f- • • * + 

2 Rn-—l,n(^n-~-l^n 

SO that the dispersion of a sum can be obtained as soon as we know the 
dispersion of its terms and their correlation coefficients. 

In an important case, expression (7) for dispersion can be greatly 
simplified. If the variables Xi, x^, . , , x^ are independent in pairs, we 
see from (5) that all the correlation coefficients are == 0, so that in this 
case simply 

(8) D — o”f + O'! -h • • • ^ cr^ = ~ af + 62 ” clI + • * * ■+• bn — 

In other words, the dispersion of a sum of variables, any two of which 
are independent, is equal to the sum of dispersions of its terms. 

8. A few examples will serve to illustrate the use of these formulas. 
Problem 7. Find the dispersion of the number of successes in series 
of n independent trials with probabilities pi, p^, . , . pn corresponding to 
first, second, . . . nth trial. 

Solution. As in Prob. 2 we associate with every trial a variable which 
assumes the value 1 or 0, according as the trial resulted in success or 
failure. These variables xi, x^, . . . Xn are independent because the 
trials are supposed to be independent. The number of successes 

m = + 0^2 + * • • + Xn 

is thus the sum of the independent variables. To find the dispersion of 
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any one of these variables Xi we notice that 

E(xi) = 1 • Pi + 0 • gi = 

E{xf) = 1 • Pi + 0 • gi = Pi; 
therefore the dispersion of Xi is 

<^i = Ti - Vi = Vi^i 

and by (8) 

D = E(m - Pi - P2 - • * • - Vny = Pigi + V^q^ + * * • + Pngn. 

In the Bernoullian case of independent trials with the same probability 
Pj we have pi = = • • • == p^^ = p and 

E(m — npY = npg. 

This formula is equivalent to the relation 

n 

^ T^{m — n-pY = npq 

m = 0 

established on page 116. 

yt^roblem 8. In a lottery m tickets are drawn at a time out of n 
tickets numbered from 1 to n. Find the dispersion of the sum s of the 
numbers on the tickets drawn. 

Solution. Let Xi, X 2 , . . . x^i be the variables representing the 
numbers on the first, second, . . . mth tickets. By Prob. 5 we know that 

E{x^) = 

and in a similar way we find 

= P + 2^ + • • • + _ (n + l){2n + 1) ^ 


whence the dispersion of x,- is 



n + 




— 1 
12 


Since we deal in the present case with dependent variables, we must 
find the correlation coefficients, or, which is the same, 


/ ^ -j- l\ / 'jfi -|- l\ 

2-K'-~) 

for every pair of subscripts i and j. The variable Xi may have any of 
the values 1, 2, 3, . . . n, with the same probability 1/n; and Xj may 
have any of the same values with the exception of that assumed by oji 
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with the probability 


so that the preceding expression consists of 


terms 


n{n 


1 _ !L±iY^. _ !L±_A 

- 1)V * 2 A ' 2 ) 


where Xj for given rci = 1, 2, . . . n, ranges over all numbers 1, 2, 
3, . . . n with the exception of Xi. As 


it is obvious that 

and 


n 


'(x- 

(x- 

1 ^ 


Y' 2 p 

V’ 2 ) 

' - nin- 1)^' 

r 2 ; 


_ n + 1 
12 

Everything now is ready for the application of (7) . All simplifications 
performed^ we get the following expression of the required dispersion 

^ _ m{n‘^ — 1)/. m — l\ 

I2~“V 

If the variables were independent, the dispersion would be 

m{n^ — 1 ) 

12 

The dependence diminishes it, but the influence of dependence is not great 
if the ratio m/n is small. 

Problems for Solution 

1. Find the mathematical expectation M of the absolute value of the discrepancy 
m “ np in a series of n independent trials with constant probability p. Ans. By 
definition 

n 

M == Tmlrn — np\ 


Tm — 


nl 


ml(n — m)! 




where, as usual, 
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But since 


Tm{m — np) — 0, 


m = 0 


we have also 


■^ = 2 ^ Trrt(m - np)j 

m>np 

the sum being extended over all integers m which are >np. Denoting by F(Xj y) the 


Fix, y) = ^ 


m >np 


we have 


2 - np) = p— - npF(p, q). 


dp 

m >np 

On the other hand, by Euler's theorem on homogeneous functions 

^F 

nF(p, q) =p— + 2— > 
dp dq 

whence 

/ dF dF\ 

T’mC’w - np) = ” ig j npqC!!ZiP^~^T'~'‘- 

m >np 

Here p represents an integer determined by 

Ac^np + 1 < p + 1. 

The answer is therefore given by the simple formula 

M — 

2. By applying StirHng’s formula (Appendix 1, page 347) prove the following 
result: 


where 


c = max. I 

and n is so large as to make c ^ Jfo- 
Hint: 


\np — 1 ng — 1/ 


1^1 < 1 


, /_ hnpq\ d- , t?' 1 / 1 - 1 \ 

og y \ ^ ^ 2(?ip — 1 ?) 2{nq — ^') 24 


12(top - t?) 12(n2 - N) 4(np - 4(ng - tf')* 
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yz. What is the expectation of the number of failures preceding the first success in 
an indefinite series of independent trials with the probability p? 

VQ Q 

Ans, qp + + Zq^p — — = — 

Balls are taken one by one out of an urn containing a white and h black balls 
until the first white ball is drawm. What is the expectation of the number of black 
balls preceding the first white ball? 

Ans. 1. By direct application of definition the following first expression for the 
required expectation M is obtained: 


M - 


a 


u "{“ 6j fit ^ 


+ 2 


6(6 - 1 ) 


(a -j- & 
+ 3- 


+ 


l)(a + 6 -2) 

6(6~-l)(6~2) 

(a + 6 ~ l)(a + & - 2) (a + 6 - 3) 


+ 


Ans. 2. However, it is possible to find a simpler expression for M. Denote by xi the 
number of black balls preceding the first white ball, by X 2 the number of black balls 
between the first and second white ball, and so on; finally, by Xa+i the number of black 
balls following the last white ball. We have 

Xi -i- -jr * * ‘ + Xa+i = h 

and 

E{xi) + E(x2) -f* * • * + E{xa+i) = b. 

But as the probability of every sequence of balls (that is, of every system of numbers 
Xif Xij . . . Xa+i) is the same, namely, 


am 

(a + 6)! 

it is easy to see that 


That is, 
or 


E{xi) = E{x 2 ) = • • . = E{xa+i) = M. 


{a + l)Jkf = 6 


M - 


h 

d + 1 


Equating this to the preceding expression for M, an interesting identity can be 
obtained, whose direct proof is left to the student. 

6. In Prob, 6, page 168, to determine the probability <p(n), we had an equation 




II 


2! 




^(0) = L 


Find the general expression for <p(n) using the method of generating functions. Ans. 
Let 

F{x) = ^=^( 0 ) + <pil)x + <p{2)x^ + • • • 
be the generating function of <p(n). Multiplying this series by 


^■'■T!+^+3! + 
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we find 


or 


whence 


e^F{x) = • * • = 

1 — X 




e-x 


1 — X 


vin) - 1 - ^ + - - • • + 


(- 1 )" 

nl 


\/6. The total number of balls in an urn is known, but the number of white balls 
depends on chance and only its mathematical expectation is known. Find the prob- 
ability of drawing a white ball. Ans. Let N be the total number of balls and M the 
expectation of the number of white balls. The required probability is M /N. 

vx: Two urns contain, respectively, a white and h black and a. white and jS black 
balls. A certain number c (naturally not exceeding a + b) of balls is transferred 
from the first urn into the second. What is the probability of drawing a white ball 
from the second urn after the transfers? Ans. The required probability is 


a -h 


ca 

a A- b 


« + jS + c 


8 , An urn contains a white and b black balls. After a ball is drawn, it is to be 
returned to the urn if it is white; but if it is black, it is to be replaced by a white ball 
^from another urn. What is the probability of drawing a white ball after the foregoing 
operation has been repeated x times? Atis. Denote by Mx the expectation cf the 
number of white balls after x operations. From the equation 


Mx+i 

the following expression for Mx can be derived : 

Mx — a A- b — bi 

It follows that the required probability is 
P 


Mx + 1 


d: 

(i - -L-Y 

\ a + bj 


.6-— .Y- 

Q, A~ b\^ CL A" b J 


9. Urns 1 and 2 contain, respectively, a white and h black and c white and d black 
balls. One ball is taken from the first urn and transferred into the second, while 
simultaneously one ball taken from the second urn is transferred into the first. What 
is the probability of drawing a white ball from the first urn after such an exchange 
has been repeated x times? Ans. Let Mx and Px represent the mathematical expecta- 
tions of the number of white balls in the first and second urn after x exchanges. Then 

Px Mx . 

c + d a A~b^ 


Mx+i == Mx “b 


Mx ”h Px — G> A~ 0 
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whence 

® a + 64-c + d a b c + dj 

10 . An urn contains pN white and qN black balls, the total number of balls being 
N. Balls are drawn one by one (without being returned to the urn) until a certain 
number n of balls is reached. What is the dispersion of the number m of white balls 
drawn? Ans. Let = 1 if the ^th ball drawn is white and Xi = 0 if it is black. 
We have 

E{xi) = Pj E{m) = np, E{x\) = p 

and 

VQ 

E{xi - p)ixi - p) = E{xiXi) - 


The required dispersion is 


D — E{m — npy = npq- 


N -n 


N - 1 


11. In a lottery containing n numbers (1, 2 , 3 , . . . n) m numbers are drawn at a 
time. Let Xi represent the frequency of a specified number ^ in W drawings. Prove 
that 


where 


12. Let 


E{xi) = Np, E{xi — Np)^ = Npq 
E(xi - Np)(xj - Np) = Np(p' - p); (i 9^ j) 


p = 


g = 1 - p, 


p' = 


m — 1 
n — 1 


Zi = {xi — Np)^ — Npq. 


Show that the dispersion of the sum 


is 


-j- ^2 -h ' * * Zn 


^ 2 N{N- 1 )^ ,, 

D = — {npqy. 

n — 1 


Indication of the Proof. Let N variables ^1, ^2, • • • be defined as follows: 

— —p if in the Mh drawing the number i fails to appear 
Ifc = g if in the kth drawing the number i appears. 

In a similar way, we can define N variables 971, . . . r}N associated with the 

number j 9^ i. Since 

Xi — Np = d- ^2 + • * • + Iv 
Xj — Np = 7 }i A' Vi -h • ’ • + 97 W 

we have 

QUixi—Np) . ^v(xj’-Np) =s ^u^j+vrji . 


The variables 
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being independent, we bave 

^(^Quixi-Np)+v(^xj-Np)'^ = . E{eHi'^^^2) • . - E{e'^^N+nN'), 


But, 


= j&(e“l2+®’?2) = . . . = ^(ew^w+®’7y) = 

= p(l — p')e9““P^ 4- p(l — + (g — p “ 

= i?(M, «). 

Hence 


It suffices to expand both members into power series in u and v and compare terms 
involving to find 


EiziZj); i9^j. 


The rest does not present serious difficulties except for somewhat complicated calcula- 
tions. 

13, A box contains 2" tickets among which C^ tickets bear the number i {i ~ 
0, 1, 2, ... n). A group of m tickets is drawn; denoting by s the sum of their 
numbers, it is required to find the expectation E and the dispersion D of s. 


A T, 1 

Ans. E == -mn: 
2 


D = -mn 
4 


mim — l)n 
4(2^^ - 1) ’ 


14. A box contains k varieties of objects, the number of objects of each variety 
being the same. These objects are drawn one at a time and put back before the 
next drawing. Denoting by n the smallest number of drawings which produce 
objects of all varieties, find E{n) and E{n^). Ans. 


+i+ • . . +l) +1+ . . . +0^ 

Use the result of Prob. 12, p. 41. 
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CHAPTER X 


THE LAW OF LARGE NUMBERS 


1. The developments of the preceding chapter, combined with a 
simple lemma due to Tshebysheff, lead in a natural and easy way to a 
far reaching generalization of Bernoulli’s theorem, known under the 
name of the ^^law of large numbers.” 

Tshebysheff’ s Lemma. Let u he a variable which does not assume 
negative values^ and a its mathematical expectation. The probability of the 
inequality 

u S dt^ 


is always greater than 


whatever t may be. 

Proof. Let 



nij U2y • • • Un 


be all the possible values of the variable u and 


pi, . . . Pn 

their respective probabilities. By the definition of mathematical expec- 
tation, we have 

(1) plUi 4- P2U2 + • • * + PnUn = a. 

We may suppose the notations so chosen that 

U/\j Uf2, . » • U/ot 

are all the values of u which are the reinaining values 

ttce-fl, Ua-\. 2 , • • • U-n 

being >at^. If all the terms in (1) with subscripts 1, 2, . . . a are 
dropped, the left-hand members can only be diminished, since these 
terms are positive or at least nonnegative by hypothesis. We have, 
therefore, 

PotJ^lUa+l + V • 4- S d. 


But as 


Ui > at^ 
182 
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for i = a 4- 1, a + 2, . . . a still stronger inequality, 


or 


+ • • • + Pn) < a 
Vot+i + * * * + Pn < p 


will hold. 

Here the left-hand member represents the probability Q of the 
inequality 

u > aP 


because this inequality can materialize only in the following mutually 
exclusive forms: either u = u a+i, or u = Ua+ 2 , ox u = Un whose 
probabilities are, respectively, pa+ 2 , . . . Pn- Thus 



But if P is the probability of the opposite event 


we must have 
whence 


u ^ at^j 

P -f Q = 1, 
P > 1 ~ p 


which proves the lemma. 

2. Let:ri, a; 2 , . . . be a set of stochastic variables and Ui, a 2 , ... an 
their respective expectations. The dispersion of the sum 

+ :r2 + * • * + Xn 

which we shall denote by Bn is, by definition, the mathematical expecta- 
tion of the variable 

u = (xi + X2 + • • * + Xn — ai — a2 — • ' • “ 

Tshebysheff 's lemma, applied to this variable u, shows that the proba- 
bility of the inequality 


(xi + X2 + * • * + Xn ai — a2 -- • * • ~ Un)^ ^ Bnt^ 


is greater than 
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But the preceding inequality is equivalent to two inequalities 

— 1\/ Bn = ^2 "f* ’ * * “4“ — * * * — dn ^ t'X/ Bn 

or, dividing through by n, 

Xn ai + ^-2 4" * * * "h Cln ^ ^ iBn 


-t /:^ < + ^2 4- 


n 


n 


Hence, the probability of these inequalities for an arbitrary positive t 
is greater than 

I ^ h 

Let e be an arbitrary positive number. Defining t by the equation 

fK 


t. 


whence 


P = 




we arrive at the following conclusion: The probability P of the inequalities 


-€ ^ 


Xl + X2 + 


^ Xn 4" + 


4" Cln 


n 


n 


^ e 


equivalent to a single inequality 


Xi X2 * • 

• + Xn Ui + a2 4“ * ' 

* 4~ dn 

n 

n 



is greater than 


Bn 


Thus far nothing has been supposed about the behavior of Bn for 
indefinitely increasing n. We shall now suppose that the quotient 
Bn/n^ tends to 0 as nuncreases indefinitely. Then, having chosen two 
arbitrarily small positive numbers e and rj, a number no can be found so 
that the inequality 


Bn 


< n 


will hold for n > no. Consequently, we shall have 

P > 1 - 7) 

for all n > no. This conclusion leads to the following important theorem 
due, in the main, to Tshebysheff : 
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The Law of Large Numbers. With the 'prohability approaching 1 or 
certainty as near as we please, we may expect that the arithmetic mean of 
values actually assumed by n stochastic variables will difer from the arithmetic 
mean of their expectations by less than any given number, however small, 
provided the number of variables can be taken sufficiently large and provided 
the condition 


Bn . 

0 as n — > CO 

is fulfilled. 

If, instead of variables Xi, we consider new variables Zi = Xi — ai 
with their means = 0, the same theorem can be stated as follows: 

For a fixed e > 0, however small, the probability of the inequality 


iSl + ^2 + • ’ 

‘ + Zn 

n 



tends to 1 as a limit when n increases indefinitely, provided 


This theorem is very general. It holds for independent or dependent 
variables indifferently if the sufiScient condition for its validity, namely, 
that 

Bn ^ 

—5- 0 as > 00 

is fulfilled. 

3. This condition, which is recognized as sufficient, is at the same 
time necessary, if the variables Zi, Z2, . . . Zn are uniformly bounded; 
that is, if a constant number (one independent of n), C, can be found 
so that all particular values of Zi(i = 1 , 2, . . . n)^are numerically less 
than C. Let P, as before, denote the probability of the inequality 

1^1 + 212 + • * * + ^ ne. 

Then the probability of the opposite inequality 

\zi A- ^2 ‘ -h ^n\ > nG 

will be 1 — P. 

Now, by definition. 

Bn = E{zi + ^2 + • • * + 

whence one can easily derive the inequality 


Bn < n^C^(l -P) + nh^P 
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from 'whicli it follows that 

~ < CKl - P) + e^P < + (72(1 - P). 

If the law of large numbers holds, 1 P converges to 0 when n 
increases indefinitely, so that the right-hand member for sufficiently 
large n becomes less than any given number, and that implies 


which proves the statement. 

4. There is an important case in which the law of large numbers 
certainly holds; namely, when variables xi, x^, , . . are independent 
and the expectations of their squares are bounded. Then a constant 
number C exists such that 

hi = E{x\) <C for f = 1, 2, 3, . . . . 

On the other hand, for independent variables 

n . n 

Bn = ^ 

i = 1 t 1 

and 

Bn C ^ 

_< ^0 as > 00 . 

n 

The expectations of squares are bounded, for instance, when all the 
variables are uniformly bounded, which is true, for instance, for “iden- 
ticaF^ or ^^equaF^ variables. Variables are said to be identical if they 
possess the same set of values with the same corresponding probabilities. 

5. E. Czuber made a complete investigation of the results of 2,854 
drawings in a lottery operated in Prague between 1754 and 1886. It 
consisted of 90 numbers, of which 5 were taken in each drawing. From 
Czuber^s book ^^Wahrscheinlichkeitsrechnung,’' vol. 1, p. 141 (2d ed., 
1908), we reprint the table shown on page 187. 

Withthe2,854drawings, we associate 2,854 variables, . . 0:2854 

representing the sum of five numbers appearing in each of the 2,854 
drawings. These variables are identical and independent with the 
common mathematical expectation 227.5. Hence, by the law of large 
numbers, we can expect that the arithmetic mean of actually observed 
values of these variables will not notably differ from 227.5. To form 
the sum 

2864 

8 = '^Xi 
^ = 1 
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Numbers 

Their frequency 
m 

Difference 
m — 158 

6 

138 

-20 

39, 65 

139 

-19 

16, 41, 76, 87 

142 

-16 

2, 14, 56, 79, 86 

143 

-15 

18, 44, 47 

144 

-14 

72, 80 

145 

-13 

12 

146 

-12 

21, 53 

147 

-11 

70 

149 

- 9 

24, 32, 55, 69 

150 

- 8 

27, 64, 75 

151 

- 7 

81 

152 

- 6 

23, 29, 85 

153 

- 5 

19, 35, 42, 74 

154 

-- 4 

7, 20, 59 

155 

- 3 

13, 34, 40, 67, 88 

156 

- 2 

11, 52, 68 

157 

- 1 

17, 82 

158 

0 

15, 90 

159 

1 

58 

160 

2 

8, 25, 36 

161 

3 

22 

162 

4 

33, 57 

163 

5 

51 

164 

6 

3, 43, 45, 48 

165 

7 

10, 26, 66 

166 

8 

1, 5, 60, 84 . 

167 

9 

50, 62 

168 

10 

9, 61, 63 

170 

12 

54, 73 

171 

13 

49, 71, 78 

172 

14 

28 

173 

15 

37 

176 

18 

30, 46 

177 

19 

89 

178 

20 

31 

179 

21 

38 

184 

26 

4 

185 

27 

77 

186 

28 

83 

189 

31 


we must multiply the frequencies given in the preceding table by the 
sum of corresponding numbers. To simplify the task we notice that all 
numbers from 1 to 90, actually appeared. Hence, we multiply the 
sum of these numbers, 4,095, by 158, which gives: 

4095' 158 = 647,010, 
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aad then add to this number the sum of the differences m — 158 multi- 
plied by the sum of the numbers in the same line. The results are: 


Hence 

and 


which differs very little from the expected value 227.5. An even larger 
difference would be in perfect agreement with the law of large numbers 
since 2,854, the number of variables, is not very great. 

6. The two experiments reported in this section were made by the 
author in spare moments. In the first experiment 64 tickets bearing 
numbers 0, 1, 2, 3, 4, 5, 6 and occurring in the following proportions: 


Number 

0 

1 

2 

3 

4 

5 

6 

Frequency 

1 

6 

15 

20 

15 

6 

1 


were vigorously agitated in a tin can and then 10 tickets were drawn at a 
time and their numbers added. Altogether 2,500 such drawings were 
made and their results carefully recorded. From these records we 
derive Tables I and II. 


Sum of positive products 
22,336 


Sum of negative products 
-19,587. 


S = 647,010 + 22,336 - 19,587 = 649,759 
S 


2854 


- 227.67, 


Table I 


Number 

Frequency observed 

Expected frequency 

Discrepancy 

0 

404 

390.625 

+ 13.375 

1 

2,321 

2,343.75 

-22.75 

2 

5,850 

5,859.375 

- 9.375 

3 

7,863 

7,812.5 

+50.5 

4 

5,821 

5,859.375 

-38 .375 

5 

2,344 

2,343.75 

+ 0.25 

6 

397 

390.625 

+ 6,375 


The next table gives the absolute values of differences s — 30 where s 
is the sum of the numbers on 10 tickets drawn at one time, and their 
respective frequencies. 

From Table I it is easy to find that the arithmetic mean of all 2,500 
sums observed is: 


74996 


29.9984 


2500 
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Table II 


|s - 30| 

Frequency observed 

|s - 30| 

Frequency observed 

0 

246 

7 

71 

1 

549 

8 

44 

2 

479 

9 

25 

3 

379 

10 

8 

4 

324 

11 

4 

5 

241 

12 

1 

6 

129 




whereas the expectation of each of the 2,500 identical variables under 
consideration by Prob. 13, page 181, is 30. By the same problem the 
dispersion of s, that is, E{s — 30) ^ is 12.857. On the other hand, from 
Table II we find that 


and 


S(s - 30)2 ^ 31477 


S(s -- 30)2 
2500 


12.5908 


fairly close to 12.857. 

In the second experiment we tried to produce cards of every suit in n 
drawings (n being the smallest number required) of one card at a time, 
each card taken being returned before the next drawing. By Prob. 14, 
page 181, we find that the expectation and the dispersion of this number 
n are, respectively, 83^^ and 14.44. Altogether 3,000 values of n were 
recorded, of which 33 was the largest. Values of the difference n — 8 are 
given in Table IIL 


Table III 


n — 8 

Frequency 

n — % 

Frequency 

— 8 

Frequency 

-4 


6 

77 

16 

3 

-3 


7 

50 

17 

5 

-2 

426 

8 

40 

18 

2 

-1 

407 

9 

31 

19 

1 

0 

348 

10 

17 

20 

3 

1 

247 

11 

15 

21 

1 

2 

228 

12 

13 

22 

1 

3 

156 

13 

6 

23 

1 

4 

116 

14 

9 

24 

0 

5 

88 

15 

6 ^ 

25 

1 


From this table we find 

Mn - 8) = 965, ll{n - 8)2 = 43,395, 
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whence 


S(7i - 8i)2 = S(n - 8)" - f2(n - 8) + = 43,085 

2n = 24,965. 

By the law of large numbers we may expect that the quotients 

Sn j S(ra — 8i)^ 

3000 3000 


will not considerably differ from S}4 and 14.44, respectively. As a 
matter of fact, 


'Zn 

3000 


= 8.322, 


S(n -- 8|)^ 
3000 


14.362. 


There is a very satisfactory agreement between the theory and this 
experiment in another respect. Of 24,965 cards drawn there were 

6,304 hearts 
6,236 diamonds 
6,131 clubs 
6,294 spades 

whereas the expected number for each suit is 6241.25. 

7. So far, we have dealt with stochastic variables having only a finite 
number of values. However, the notion of mathematical expectation, 
and the propositions essentially based on this notion, can be extended to 
variables with infinitely many values. Here we shall consider the 
simplest case of variables with a countable set of values, that can be 
arranged in a sequence 

• * * < < cc^i < ao < ai < 0:2 < * • • 

in the order of their magnitude. 

With this sequence is associated the sequence of probabilities 

. . . , p_2, p-i, po, Pn P2, ... 

so that in general pi is the probability for x to assume the value ai. 
These probabilities are subject to the condition that the series 

Spi = ; V + p _2 + p~i + Po + Pi + P 2 + • • * 
must be convergent with the sum 1. 

The definition of mathematical expectation is essentially the same 
as that for variables with a finite number of values, but instead of a 
finite sum, we have an infinite series 

E(x) = Xpiai 

provided this series is convergent (it is absolutely convergent, if con- 
vergent at all). If this series is divergent, it is meaningless to speak of 
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the mathematical expectation of x. Likewise, the mathematical expec- 
tation of any function (p{t) is defined as being the sum of the series 

E{ip{x)} = 'Lpicp{ai), 

provided the latter is convergent. 

It can easily be seen that various theorems established in Chap. IX, 
as well as Tshebysheff's lemma, continue to hold when the various mathe- 
matical expectations involved exist. 

The law of large numbers follows, as a simple corollary, from Tsheby- 
sheff ^s lemma if the following requirements are fulfilled : 

a. Mathematical expectations of all variables Xi, X 2 j xsj . . . exist, 
h. The dispersion Bn of the sum Xi + X 2 + • • • + Xn exists, 
c. The quotient Bn/n^ tends to 0 as n tends to infinity. 

The first requirement is absolutely indispensable. Without it the 
theorem itself cannot be stated. The second requirement (not to speak 
of the third) need not be fulfilled; and still the law of large numbers may 
hold, as Markoff pointed out. 

8 . Let Xij X 2 j Xzj . . . be independent variables. If for every i 
the mathematical expectation 

E(xD 

exists, the quantity Bn exists also. But if at least one of these expecta- 
tions does not exist, the quantity Bn has no meaning. However, the 
following theorem, due to Markoff, holds: 

Theorem. The law of large numbers holds, provided that for some 
d > 0 all the mathematical expectations 

i = 1,2,3, .. . 

exist and are hounded. 

Proof. For the sake of simplicity we may assume that 
E{xi) = 0; i - 1, 2, 3, ... . 

For, supposing 

E(xi) = a»; i = 1, 2, 3, . . . 
instead of Xi, we may consider new variables 


Then 


Zi = Xi — a*. 

E(z^ == 0 


and it remains to prove the existence and boundedness of 




i — 1 , 2, 3, . 
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The proof follows immediately from the inequalities 


the first of which is well known; the second is a particular case of Lia- 
pounoff^s inequality, established in Chap. XIII, page 265. 

Thus, from the outset we are entitled to assume that 

E(xi) = 0. 

The proof of the theorem is based on a very ingenious and useful 
device due to Markoff. Let JV be a positive number which later we shall 
increase indefinitely. Together with Xi we shall consider two new varia- 
bles, Ui and Vi, defined as follows: a being a particular value of x^, the 
corresponding values of Ui and Vi are 


Ui = a, 2;i = 0 

if |a| g N and 

Ui = 0 , Vi = a 

if 1 qj| > iV. Thus, stochastic variables Ui and Vi are completely defined. 
Evidently 


Xi = Ut + t;* 


whence 

0 = E{ui) + E{v,) 

and 

^i = E{ud = -E{v^), 

Now 

^ Ei\xi\^+^) < c 

by hypothesis. Since Vi is either 0 or its absolute value is >iV, we have 
N^E(\vi\) ^ E{\vi\^+^) < c, 

whence 


(2) m = < ~ 

Likewise, the probability qt for Vi 9 ^ 0 satisfies the inequality 
N^+^qi ^ < c, 


Qi < 


c 

W+^' 


whence 

(3) 



Sec. 8] 


THE LAW OF LARGE NUMBERS 


193 


Now, let us consider two inequalities 

< cr 

< 0 * 

where cr is an arbitrary positive number and let Po and P be their respec- 
tive probabilities. The inequalities (4) and (5) coincide when 

Vl = V 2 = • * * = = 0. 

With this supplementary condition they have the same probability Q. 
But they can hold also when at least one of the numbers 


(4) 

(5) 


+ '2^2 + ’ * 

+ Un 

n 


\Xl+ X2+ ■ • 

• + Xr, 

1 n 1 


Vlj V2, • • . Vn 

is different from 0. Let the probabilities of (4) and (5) under such 
circumstances be Po and P. Then 

Po = Q + Po, P = Q + P. 

But evidently neither Po nor R can exceed the probability that in the 
series 

Vl, V2y . . . Vn 

at least one number is different from 0; this probability in turn does not 
exceed (see Chap. II, page 30) 


Hence 


and 


+ ^2 + 


nc 

A- qn jyi+s* 


Po 


nc 


P < 


nc 


(6) |P-Po|.<^,- 

On the other hand, since none of the values of Ui{i = 1, 2, . . . n) 
exceeds W, we have 

Accordingly, the dispersion of the sum Ui + U 2 + ' ' ' + Un will be 
less than 

cnN^~’^. 

Hence, by what has been proved in Sec. 2, the probability of the ine- 
quality 


+ W2 + * • 

* Ar Un + ^2 + * * 

‘ + fin 

n 

n 



( 7 ) 
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is greater than 


1 - 


e^n 


But whenever (7) is satisfied, the inequality 
( 8 ) 


Ml + Ms + • • 

' + Un 

n 



< i 4- d- • • • + ^n| 

= 2 n 

is also satisfied. Hence, the probability of this inequality is a fortiori 
greater than 

4cJVi-» 


1 - 


€^n 


Owing to inequalities (2), the following inequality follows from (8): 


Ml + Ms + • • 

“1“ V^n 

n 



/ ^ _L. ^ _ 

< 2 iV* 


Hence 


and on account of (6) 


Pa> 1 


e^n 


P > 1 


4:cN^'~^ nc 


e^n 

Now we can dispose of the arbitrary number N by taking 


2 


Then 


P > 1 - 2c| 


(?)■- 


Now AT" tends to infinity with n and as soon as n surpasses a certain 
limit noj the fraction 


c 


will become and remain less than e/2. The probability of the inequality 

Ui 4. ^2 + • * • + Xn\ 


n 


< € 


for n > no will be greater than P and consequently greater than 


1+5 

1 - 2cl ^ ) n“«. 
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It tends, therefore, to 1 as n tends to infinity, and that proves Markoff’s 
theorem. 

Example. Let the possible values of the variable Xp(p = 1, 2, 3, . . . ) be 

p-i(^p 4 - 1 ) 1 ^ 4 . 2)1^ _{_ 2 ) 5 ^ , , ^ 

with the corresponding probabilities 

P P P 

p + 1 (p + 1)2’ (p 4- 1)3’ ’ ’ ‘ • 

Since the series 

P P P 

is divergent, the mathematical expectation 

E(xl) 

does not exist. Yet the law of large numbers holds. For 




1 (P + 1)2 


HI -5) 


is a convergent series for any 0 < 5 < 1. Moreover, 


.l(p + 1)2 


(l-«) 


1-a 

2 2 - 1 


and consequently the conditions of Markoff ^s theorem are satisfied for any 0 < 5 < 1. 
Hence, the law of large numbers holds in this example. 


9. If variables Xi, Xz, , , . are identical, the law of large numbers 
holds without any other restrictions, except that for these variables mathe- 
matical expectations exist. In fact, Khintchine proved the following 
theorem: 

Theorem. //, as we may naturally suppose, E{xi) = 0, the probability 
of the inequality 



■ + a:„ 

n 



tends to 1 as n increases indefinitely. 

Proof. The proof is quite similar to that of Markoff’s theorem and 
is based on the same ingenious artifice. Let 


• / * < a-2 < oL-i < az < ax K a% < * * * 

bedifferent values of any one of the identical variables a; 1 , 072 , 0 : 3 , . . . and 

... yp^t, p_i, po, Pi, P2, . . . 
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their probabilities. By hypothesis 


is a convergent series with the sum 0. The series 

r. ^Pi\ai\ 

is also convergent; let c > 0 be its sum. 

Keeping the same notations as before, we have 

1^4 s E{\v^) = ^ vM = HN) 

\cci{ >V 

where ^{N) is a decreasing function tending to 0 as i\r oo. Also 

E(u\) ^ NE\xi\ == cN 

so that the dispersion of the sum 

+ '2^2 + ‘ * * + 

is less than 

cNn. 

Consequently the probability of the inequality 


(9) 


is greater than 


Ui + U2 * ' 

* + Un 

n 



n 


<1 
= 2 


AcN 


On the other hand, the probability qs of the inequality ^ 0 is less 
than 

HN) 


N 


because 


and 


N X . 

\ai\ >N 

|ai| >N 

Hence, the difference between the probability of the inequality 

< <r 


Ui + Ut+ ■ ■ 

' + Un 

n 
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and that of the inequality 


a:i + a:2 4- * • 

“f” 

n 



< or 


is numerically less than 


nxp(N) 


As in the preceding section we conclude that the probability of the 
inequality 


^1+^42+ * * 

"4“ 

n 



^ I + m 


is greater than 


1 - 


4:cN 

e^n 


Finally, the probability of the inequality 

(10) 

is greater than 


a?! + ^2 + * * 

“4" 

n 



g I + m 


1 - 


4ccN n\p(N) 




N 


To dispose of N we observe that the ratio 

VW) 

N 

is a decreasing function of N and tends to 0 as iV oo . Hence, at least 
for large n, there exists an integer N such that 

vW) < ^ - 1) 


N 


Then 


en = N -1 
4:cN . "v/lc N 


whence it follows that the probability of inequality (10) is greater than 


1 - 


\/4c| 


N 


vm) + - 1) 


Now N increases indefinitely together with n; therefore, for all n 
above a certain limit no, 


HN) < I 
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SO that for n > na the probability of the inequality 

+ Xj A- • ■ • + a;„ ^ ^ 
n 

will be greater than 


- 1 )] 


and with indefinitely increasing n will approach the limit 1. Thus 
Khintchine’s theorem is completely proved. 

Example. Let 

21-2logl^ 22-2l0&2^ 23-2los3^ ^ ^ ^ 2»—2lo8n^ . . . 

be all possible values of identical variables xiy x^j x^y . . . and 

III i 

2 22’ 23' * * ' 2« * ' " 

their corresponding probabilities. Since the series 


-i_ + —L + _i_ + 

22iogl ‘ ' 22log3 


1 J j L 

~ 2 ^°®^ ' 3108:4 ' 


is convergent, mathematical expectations of the variables xij Xz, > . , exist. 
Hence, the law of large numbers holds in this case. 

Markoff’s theorem cannot be applied here, because for any positve 6 the series 


\ 2”* 
j 72,(1+5)108 4 


is divergent. 


Problems for Solution 

1. Let a; be a stochastic variable with the mean = 0 and the standard deviation a. 
Denoting by P(t) the probability of the inequality 


show that 


P(^) ^ 

^ ^ 0-2 + ^2 

<r2 

1 -- PW ^ 


0-2 +t2 


for f < 0 
for t > 0. 


Show also that the right-hand members cannot be replaced by smallemumbers. 
iTidication of the Proof. Since 


we have also 


TipiXi = 0, XpiX^ = <r2, 

:spi{xi - 0 = -t, iipiixi - ty = £r2 -I- 
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whence, supposing that a:,- > « for i = 1, 2, . . . s and first taking t negative, 




U = i 


1 - Pit) ^ 


I ~ 1 i = 1 


^2 ^ 


P(t) ^ 


<r2 -f 


For positive t the proof is quite similar. Considering a stochastic variable with 
two values: 

^2 

Xi i, Pi ^ 


<r2 -j- 


Xz ~ > Pz — - 

t ^ (r2 -f t2 

one can easily prove the last part of our statement. 

2. Tshebysheff’s Problem,^ If a? is a positive stochastic variable with given 


B{x) 


Em 


then the probability P of the inequality 

X V 

has the following precise upper bounds: 

P SI for V < <r2 


PS— for <r^ ^ e; < — 


P ^ 


^4 4. j;2 _ 20-% 

Indication of the Proof. Let 


for t? ^ 




<rh} — 


Then ^ if v ^ and 

since 

for X ^ p. On the other hand, 

e( 

whence 


(^) 




s 1 


4 _ 2<r2f + 


(» - {)» r* + 0* - 2o-% 


P ^ 


— Scr^ 

^ Sur les valeurs limites des integrales, Jour. lAouville, Ser. 2, T. XIX, 1874. 
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The equality sign is reached for the stochastic variable with two values: 

(v - 0-2)2 

Xl - 5, Pi - ^4 + _ 2<r2o 


X2 = V, P2 = 


If 0-2 ^ r < r^/o-2 we have an obvious inequality 

p s = -• 


To show that the right-hand member cannot be replaced by a smaller number, con- 
sider the following stochastic variable with three values: 



— n 


= £: 

— a-^)v — Zcr2 .J.4 

Xi 


Pi 


Iv 




__ Icr^ 


X2 

= V, 

P2 

■" v(l 

-t;) 


= «. 


_ ■ 

— crh) 

Xz 

pz 

~ Z(Z 



where Z > i; is an arbitrary number. For this variable 

0-2 


P = P2 + pa = 


V Iv 


is arbitrarily near to <t^/v for sufficiently large 1. 

3. If X is an arbitrary stochastic variable with given 

E(x^) = (r2, E(x^) = 

and P denotes the probability of the inequality 

\x\ ^ k<T, 


P if 1 ^ Zb < 


if k 


+ Zb^ - 2Zb2 




These inequalities cannot be improved. ^ 

Hint: Follows from Tshebysheff’s problem. 

4. Let Xi assume two values, i and —i with equal probabilities. Show that the 
law of large numbers cannot be applied to variables xi, a: 2 , Xz, .... 

6. Variables xi, X 2 , Xz^ . , . each assume two values: 
log a or —log a; log (a + 1) or — log(a+l); log (a +2) or — log(a + 2); • • • 
with equal probabilities. Show that the law of large numbers holds for these vari- 
ables. 
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Hint: E{xi) = 0; t = 1, 2, 3, . . . 

Bn = F{xi + a;2 + • • • 4 ” Xn)^ = 

n-~l 

~ ^ + 1)}2 (a + n — l){log (a + w -- 1)}2 

i = 0 

as can easily- be established by using Euler’s summation formula (Appendix 1, page 
347). Hence 




0 


n — > oo . 


6. If Xi can have only two values with equal probabilities, and show that the 

law of large numbers can be applied to Xij rc 2 , . if oe < 

Hint: 

^ 20£+1 D 1 

= P«+22«+ . . . ^^0 if ^<i. 

2a 1 2 

It can be shown that the law of large numbers does not hold if o: ^ 3^. 

7. In an indefinite BernouUian series of trials with the constant probability p, 
let mi denote the number of successes in the first i trials. Show that the law of large 
numbers holds for variables 


mi — 


^ - 1, 2, 3, . 


if a > 

Hint: Evidently E(xi) - 0, E(Xi) - (zpgr)i-sa ^nd 


B„ = + 2^E{xai). 

i=l j>i 

Now 

E{xiXj) = {ij)^{pqY^^E{mi — ipY + {ij)^{pqy^^E\{mi - {j — i)p)] = 

= {pqY-^'^j-^ 

since — ip and m, — mi — (j — i)p are independent variables. Thus 

n 

i = 1 j >i 


and it is easy to show that 



as n — » 00 


provided a > J4- But the law ctf large numbers no longer holds if ck ^ 34- The 
proof of this is more diflScult. 

8. The following extension of Tshebysheff’s lemma was indicated by Kolmogoroff. 
Let xif X 2 , . . be independent variables; E(xi) = 0, Eix^) = bi, 


= 6i + 62 4 " * • * 4 " 
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and 

sjfe = ici + a;2 4- * ' * + a;*; fc = 1, 2, . . . n. 

Denoting by P the probability of the inequality 
(A) i max. ($5, si, , si) > 

we shall have P < 1/t^. 

Indication of the Proof. The inequality (A) can materialize if and only if one of 
the following mutually exclusive events occurs: 

event er. si > Bnt^; 

event e^: $1 ^ Bnt^; si > BnP; 

event czi $1 ^ BnP; s| S BJ^; s® > Bj^; 

event ^ Bnt^; S 2 ^ • • • sLi = 

If (ei) represents the probability of Ciii == 1, 2, . . . n) then 
P = (ei) + (^2) “]-•*•-}- (en). 

Now consider the conditional mathematical expectation E(sl\ek) of si given that 
ek has occurred. Since the indication of ek does not affect variables Xk+i, . . . Xn, 
these variables and Sk are independent. Hence 

E{sl\ek) = E(sl\ek) + 6a+i -f- • • • + 6n > BnP. 

On the other hand 


Bn = E(sl) = (e&)P(s^|efc) > Bnt’^{(ei) + (€2) -f- . . . (^n)} 

A? = 1 


whence P < l/P. 

9 . The Strong Law of Large Numbers (Kohnogorojf) . Using the same notations 
as in the preceding problem, show that the probability of the simultaneous inequalities 





S»+2 

n 4 1 

— ' 

w + 2 


will be greater than 1 — 77, provided n exceeds a certain limit depending on the choice 
of 6 and rj, and granted the convergence of the series 



1 


Indication of the Proof. Consider variables 


Ti = max. 



for Vi 


2*-i» <2^; ,i ^ 1, 2, 3, . , . 


and denote by the probability of the inequality n > By Kolmogoroff^s 
lemma 


Z=s2*n — 1 

■4 6i 
“ 2*‘-W 
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and 

« — 1 oo Zas2»n — 1 

5i +32 +9, + • • • X bt< 16e-»2 | 

ts=l Z=2*~i7i 1 = 1 

or 

00 

S’! + ffz + ?3 4- * * • < ^ ~ - 

k^n 

Hence, the probability of fulfillment of all the inequalities ri ^ Me; i = 1, 2, 3, . . . 
is greater than 


- 


k — n 




The inequalities \8k/k\ ^ e;k = n, n 4- 1, 4- 2, . . . are satisfied when simul- 

taneously 

Tf ^ §€; t = 1, 2, 3, . . . 
and 


Srt— 1| 




4B 

The probability of the last inequality being greater than 1 -y the probability 

of simultaneous inequalities 

S e; fc = n, n 4“ Ij 4- 2, . . . 
a fortiori will be greater than 

\bk 4:Bn 


1 - 166' 


-2 


k^n 

This inequality suffices to complete the proof if we notice that BnM* tends to 0 when 
the series 


k^i 




is convergent, 

10. Let xi, Xa, . . . Xn be identical stochastic variables and B(a;i) == 0. Denoting 
by Pn(e) and Pn(e)j respectively, the probabilities of the inequalities 


aji 4- 4- 


+• a?n ^ , xi 4“ a;2 4“ 

> € and — 


+ Xn 


< — e 


show that 


.. Pn{e) 

lim = 0 or 4” 

n- ooP„(6) 


^according as E(xf) > or <0. 
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For the proof see Khintchine's paper in Mathematische Annalen (vol. 101, pp. 381- 
385). 

11. The Law of the Repeated Logarithm {Khintchme, Kolmogoroff). Let Xi^ Xi, 
... Xn he bounded independent variables, E(xi) = 0, ^ = 1, 2, . . . w and Bn <» 
as n — > 00 . For an arbitrarily small 5 > 0 and e > 0 and for an arbitrarily large N 
one can choose Uq > N so that: 

a. The probability of the fulfillment of the inequality 


|5«| >(14- 5) log log Bn 

for at least one n ^ no is less than e. 

b. The probability of the fulfillment of the inequality 

> (1 - d)\/2Bn log log Bn 

for at least one n ^ no is greater than 1 — e. 

For the proof see Kolmogoroff^s paper in Mathematische Annalen (vol. 101, pp. 126- 
135). 

If Xi, X 2 , . . . Xn are variables independent in pairs and Bn the dispersion of their 
sum s = ail 4- 4“ • * * -h Xn, then the probability P that 

|s| s tVK 

satisfies the inequality 

P > 1 _ 1 (Xshebysheff ’s inequality) 

provided P(a;i) =0, i = 1, 2, . . . n, which can be assumed without loss of generality. 
In case variables are totally independent and are subject to certain limitations of com- 
paratively mild character, S. Bernstein has shown that Tshebysheff ^s inequality can be 
considerably improved. 

12. Let xi^ X 2 j . . . Xn be totally independent variables. We suppose E{xi) = 0, 
E(Xi) = bi and 

for f - 1, 2, ... 71 and h > 2, c being a certain constant. Show that 
A = • +«n)} < e2(l-<r) 

where or is an arbitrary positive number <1 and e is a positive number so small that 

€C ^ < r . 

Indication of the Proof. We have 

6“* g 1 + ec, + 

n «=2 

iS(e“<) ^ 1 + 1^“ 5 («)" < 

n «=0 


whence 
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13. If Q denotes the probability of the inequality 

Xi + Xi + ■ ■ ■ + Xn > " + - 

— or) e 

show that Q < e~^^. 

Indication of the Proof. If Q is the probability of the inequality 

g€(xi+a,2+ • • • +a5„) ^ 

then, by Tshebysheff’s lemma, ^ and Q < Q by Prob. 12. 

14. S. Bernstein’ s Inequality. Denoting by P the probability of the inequality 

laJi + a:2 + • • * + rc„| ^ co, 
w being a given positive number, show that 


P > 1 - 2e 25n+2ct0. 

Indication of the Proof. To make — - + — == p minimum take e = 

2(1 

and t is determined by equating F to co. The resulting value of e, 


then F 


I 2Bn 

= Ww 


. = -d - <r) 

Coj Cod 

is admissible only if €C ^ o- or —(I — a-) S <r. The best choice for cr is o' = -- — ; 

Bn Bn + 


and correspondingly t — 


\/2Bn 4 " 2coj 

Xi X2 Xn > 0) 


By Prob. 13 the probability of the inequality 


is less than e 2i?n+2c&)^ same is true of the probability of the inequality 

Xi X 2 + • • • + < — w or —Xi — X 2 — • • • —Xn > co. 

16. If variables Xi, x^, . . . Xn are uniformly bounded and M is an upper bound 
of their numerical values, then we may take c = MjZ. 

Indication of the Proof. Note that 


^ hiM^- 


iKir- 


16. Consider a Poisson's series of trials with probabilities pi, p 2 , . . . Pn for an 

Pi -j— P2 “I” • * • — }- Pn 

event E to occur. Let m be the frequency of P in n trials, p == — 3 


X == + P 2 g 2 + • * * Pn<ln)‘ Show that the probability P of the inequality 


-p 


S. e has the following lower limit: 




P > 1 - 2e 2x+^ 
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In the Bemoullian case pi — pt — ■ ■ ■ =Pn, X = and consequently 


p > 1 - 2^ 


17. An indefinite series of totally independent variables Xi, X 2 , Xs, . * . bas the 
property that the mathematical expectations of any odd power of these variables is 
rigorously = 0 while 




2 


(2^)1 . 
kl ' 


bi - Eixl) 


for f = 1, 2, 3j . . . . Prove that the probability of either one of the inequalities 


iCi + 332 “h ‘ ■ * *4“ iCn >• Or iTi 4” iC2 4" ‘ ‘ * Hh <C — t‘\/2Bn 

where Bn - h + + • • • 4- &n is less than (S. Bernstein). Prove first that 


E(e^i) , 

18. Positive and negative proper decimal fractions limited to, say, five decimals, 
are obtained in the following manner: From an urn containing tickets with numbers 
0, 1, 2, ... 9 in equal proportion, five tickets are drawn in succession (the ticket 
drawn in a previous trial being returned before the next) and their respective numbers 
are written in succession as five decimals of a proper fraction. This fraction, if not 
equal to 0, is preceded by the sign 4~ or — , according as a coin tossed at the same time 
shows heads or tails. Thus, repeating this process several times, we may obtain as 
many positive or negative proper fractions with five decimals as we desire. What 
can be said about the probability that the sum of n such fractions will be contained 
between prescribed limits — cd and co? Ans. These n fractions may be considered as 
so many identical stochastic variables for each of which 


(1 - 10-”5)(2 - 10“5) 1 

P(a;2*+1) = 0 , ^ = E(x^) - ^ 1 


Besides, 


106-1 






10io*+6 2A; 4- !■ 


since in general 


4. 22* + . . . + (5 - 1)2* < 


2/c 4“1 


Again, the inequality 


E{x^) ^ 


A* (2k)l 


2/ k\ 


can easily be verified and we can apply the result of Prob. 17. For the required 
probability P the following lower limit can be obtained: 


P > 1 - 2e > 1 - 2e~ 


3c«>* 
2n , 
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or, if w = ne 

P > 1 - 2e 

For example, if e = J^o and n ^ 814, 

P > 0.99999, 

that is, almost certainly the sum of 814 fractions formed in the above described man- 
ner will be contained between —82 and 82. 
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CHAPTER XI 

APPLICATIONS OF THE LAW OF LARGE NUMBERS 

1. A theorem of such wide generality as the law of large numbers is a 
source of a great many important particular theorems. We shall begin 
with a generalization of Bernoulli's theorem due to Poisson. 

Let us consider a series of independent trials with the respective 
probabilities pi, p 2 , ps, . . . , varying from one trial to another. Con- 
sidering n trials, we shall denote by m the number of successes. The 
arithmetic mean of probabilities in n trials 

= Pi + Pg + ’ ' ‘ 

^ n 

will be called the ^^mean probability in n trials.^^ With such conditions 
and notations adopted, we can state Poisson’s theorem as follows: 

Poisson’s Theorem. The probability of the inequality 



for fixed e > 0, no matter how small, can be made as near to 1 {certainty) as 
we please, provided the number of trials n is sufficiently large. 

Proof. To show that this theorem is but a particular case of the law 
of large numbers, we use an artifice often applied in similar circum- 
stances, namely, we associate with trials 1, 2, 3, ... n variables Xi, 
X 2 , Xzj • , • Xn defined as follows: 

Xi = 1 in case of success in the ith trial, 

Xi = 0 in case of failure in the ith trial. 

Since the trials are independent, these variables are also independent. 
Moreover 

E{xi) = E(xl) = Pi 

and the dispersion of Xi is 

Pi - 2>< = 

The dispersion Bn of the sum 

Xi + X2 + ‘ * + Xn 

is the sum of the dispersions of its terms, that is, 

Th 

Bn = Piqi + P%q2 4 - * * * + Pnqn S 

At the same time, the former sum represents the number of successes m. 

208 
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Now, applying the results established in Chap. X, Sec. 2, we arrive 
at this conclusion: Denoting by P the probability of the inequality 


m 

n 


- V 




we shall have 


P>1- 


Bn 


^ 1 - 


4:ne^ 


It now suflS.ces to take 


n > 


4e^rj 


to have 


P > 1 


where rj is an arbitrary positive number no matter how small. That 
completes the proof of Poisson^s theorem. 

Evidently Bernoulli’s theorem is contained in Poisson’s theorem as a 
particular case when . 


Vl = V2 = • • • = Prt = 

Poisson himself attached great importance to his theorem and adopted 
for it the name of the ^Taw of large numbers,” which is still used by many 
authors. However, it appears more proper to reserve this name to the 
theorem established in Chap. X, Sec. 2, which is due to Tshebysheff. 

2 . Let us consider n series each consisting of s independent trials with 
the constant probability p. Also, let 

mi, m 2 , . . . Mn 

represent the number of successes in each of these 5 series. Stochastic 
variables 

Xi = (mi — spy, X2 = (m2 — spy, • • • = (mn — spy 

are independent and identical. Their common mathematical expecta- 
tion is spq. The law of large numbers can be applied to these variables 
and leads immediately to the conclusion: The probability of the inequality 

n 

- spy 


can be brought as near as we please to 1 (or certainty) if the number of 
series n is sufficiently large. 
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Substituting iSfq for e and dividing through by spq, we may state the 
same proposition as follows : The probability of the inequalities 


1 


- s-pY 



Npq 


<1 + 6 , 


where N == ns is the total number of trials in all n series, can be brought 
as near to 1 as we please if the number of series is sufficiently large. 

The law of large numbers can be legitimately applied to the variables 

Xi == [mi — sp\; i = 1, 2, 3, . . . 
with the common mathematical expectation 


Ms = 2spqC^sZiV^~^q"''^ 

where p = [sp + 1], and leads to the following proposition: The proba- 
bility of the inequalities 

n 

- sp| 

1 - € < < 1 + e 

nM, 

can be brought as near to 1 as we please if the number of series is suf- 
ficiently large. 

For the sake of simplicity, let us use the notations 


- spy 

= iri 

n 

n 

- sp| 

B = 

n 


The probabilities P and P' of the inequalities 

(1) — o) < A < 's/spqil + cr) 

( 2 ) Ms{l - cr) <B < Ms{l + cr) 
which are equivalent to 


^(mj - spY 


(1 - <rY < 


1 — cr < 


2 


nspq 


< (1 + <rY 


- sp| 


i=» 1 


nM, 


< 1 -|- <r 
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can both be naade greater than 1 — ij, where -q is an arbitrarily small 
positive number. The probability of simultaneous materialization of 
(1) and (2) is not less than 

P + P' - 1 > 1 - 2t/. 

But whenever (1) and (2) hold simultaneously, we have 


( 3 ) 


\/ spg 1 — <T ^ A ^ -s/ spq 1 + tr 

M, rr^ ^ s ^ ~m7 


Therefore the probability of these inequalities is again >1 — 2tj. Now 
let us take 


2 + r 

where t is another positive number arbitrarily chosen. Then 


1 d-tr 
1 - <r 

Hence, the inequalities 
■\/spq 


= l+r; 


1 - .r 
1 + O' 


> 1 




follow from inequalities (3) and their probability is a fortiori > 1 — 277 . 
It suffices to take 


r 


■y/ spq 


to arrive at the following proposition: 
The probability of the inequality 


A Vaw 

B ~M7 


< € 


for a fixed e and sufficiently large number of series can be made as near to 
1 as we please. 

If spq is somewhat large, the quotient 

Vw 

Ms 

differs but little from 's/7r/2 (see Chap. IX, Prob. 2, page 177). Hence, 
when the number of series is large and the series themselves sufficiently 
long, we may expect with great probability that the quotient 

A 

B 

will not differ much from ■\/irl2. 
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DrVEEGENCE COEFFICIENT 

3. The considerations of the preceding section can be generalized. 
Let us consider again n series containing s trials each, and let 

mi, m2, . . . rrin 

represent the numbers of successes in each of these series. Without 
specifying the nature of the trials (which can be independent or depend- 
ent) we shall denote by p the mean probability in all N = ns trials and 
by g = 1 — p its complement. Again considering the quotient 

n 

'^(rui - spy 

Q = ’ 

Npq 

we seek its mathematical expectation 

E{Q) = D. 

When all the N trials are of the Bernoullian type, D == 1. But it is also 
possible to imagine cases when D > 1 or Z> < 1. Lexis calls \/D the 
'^coefficient of dispersion.’^ We shall call D itself the "theoretical 
divergence coefficient.” If mi, m 2 , . . . m^ are actually observed fre- 
quencies in n series, the quotient 


- spy 

jy — Izi 

Npq 

may be called “empirical divergence coeiEcient.” Then, if the law of 
large numbers can be applied to variables 


Xi = 


{rrii — spY ^ 
spq ’ 


i — 1, 2, 3, 




we can expect with probability, approaching certainty as near as we please, 
that the inequality 

ID' - D| < 6 

will be fulfilled for an adequately large number of series. 

Thus far we have not specified the nature of the trials. Now we shall 
suppose that all N = ns trials, distributed in n series, are independent 
but with probabilities var 3 dng in general from trial to trial. Let 

Vui P 2 i, . . . v,i (i ===1,2,... n) 
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be the probabilities in successive trials of the ith series. Their mean 
„ _ Pk + P2i + • • • + 

p. _ , 

! 

is the mean probability in the ^th series. Finally 

^ Pi + P2 + • ‘ + yn 

^ n 

is the mean probability in all N = ns trials. As to the expectation of 
{mi — spYy we find 


E{mi — spY = E{mi — spi + s(j)i — p)Y == E{mi — sp^)^ + s^{pi — pY 

since 


E(mi — spi) = 0. 


On the other hand, 
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Lexis^ Case. Probabilities remain the same within each series, 
but vary from series to series. In this case 'pji = pi and the expression of 
D becomes : 

n 

2) = 1 + 

npq ^ 

i = l 

The theoretical divergence coefficient in this case is always greater than 
1 and may be arbitrarily large. 

Poisson’s Case. The probabilities of the corresponding trials in all 
series are the same, so that 

Va = 

and 

^1 + '3r2 + ' • * + Xs 
p = Vi ^ 


In this case the divergence coefficient 


X (y - 

B = l - 

m 

is always less than 1. 

Since the law of large numbers evidently is applicable to variables 

^ (m,- - 

spq ’ 


we may expect that the empirical divergence coefficient D' will not 
differ much from D if the number of series is sufficiently large. 

For numerical illustration let us consider 100 series each containing 
100 trials, such that in 50 series the probability is % and in the remaining 
50 series it is Here we evidently have Lexis’ case. The mean 
probability in all trials is 

V = h 

and 

lOO 

~ P*)' == + 50 • Tir = 1- 

t = l 

Finally, 

i) = 1 + H = 4.96. 

Now, suppose that we combine in pairs series of 100 trials with 
probability % and series of 100 trials with probability to form 50 
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series each of 200 trials. Evidently we have here Poisson’s case. The 
mean probability in each series again is 2? = M 


200 

^ (2 ““ TTi)^ = 100 * -x-J q- + 100 ' = 2. 

^ = 1 

Finally, 

D = l - ^ 0.96. 

The consideration of the divergence coefficient may be useful in 
testing the assumed independence of trials and values of probabilities 
attached to these trials. In the simplest case of Bernoullian trials with 
a constant and known probability, the theoretical divergence coefficient 
is 1. Now, if the number of series is sufficiently large and the empirical 
divergence coefficient turns out to be considerably different from 1, 
we must admit with great probability that the trials we deal with are not 
of the supposed type. If, however, the empirical divergence coefficient 
turns out to be near 1, that does not conclusively prove the hypothesis 
concerning the independence of trials and the assumed value of the 
probability. It only makes this hypothesis plausible. 

There are cases of dependent trials (complex chains considered by 
Markoff) in which the theoretical divergence coefficient is exactly 1 and 
the probability of an event has the same constant value in each trial, 
insofar as the results of other trials remain unknown. Cases like that 
may easily be mistaken for Bernoullian trials without further detailed 
study of the entire course of trials. 

4. When there is good reason to believe that the trials are independent 
with a constant but unknown probability, we cannot in all rigor find the 
value of the empirical divergence coefficient 

n 

^ (mi - spy 

D' = 

Npq 

to compare it with the theoretical divergence coefficient D = 1, since p 
remains unknown. 

But, relying on Bernoulli’s theorem, we can take the quotient 


where 


M 

N 


M = i7ii H- m^, -I" • * * "4“ 


as an approximate value of p. By taking p = M/N in the preceding 
expression for D' we get another number 
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n 


mV 


D" = 


M(JV - M) 


which in general is close to D' . However, considering mi, m 2 , . . . m„ 
not as observed but as eventual numbers of successes in n series, the 
mathematical expectation of B" is different from 1. To avoid this 
difficulty, it is better to consider a slightly different quotient 


Q 


n(N - 

i = 1 

(n - 1)M{N - M) 


For this quotient there exists a theorem discovered and proved for the 
first time by the eminent Russian statistician Tschuprow. 

Theorem. The mathematical expectation of Q is rigorously equal to 
Proof. Here we shall develop the proof given by Markoff. The 
above given expression of Q presents itself in the form % and therefore 
has no meaning in two cases: M = 0 or ikf = iV'. For these exceptional 
cases we set Q == 1 by definition. If neither Af = 0 nor M = N, we 
can present Q in the form 


(4) 


Q 


n(N 


1) S’”*- 


n 


M{N - M) 


Considering mi, m 2 , . , . as stochastic variables assuming integral 
values from 0 to §, the probability of a definite system of values 


IS 


mi, m2, 
si 


. rrin 


si 


mil(s — mi) I ??^ 2 l(s ~ m 2 )l 


mnl(s — mn)l^ ^ 


To get the expectation of Q we must multiply it by P and take the 


sum 


E{Q) - SPQ 


extended over all non-negative integers mi, m 2 , . . . m„, each of them 
not exceeding s. To perform this multiple summation we first collect 
all terms with a given sum 

mi -f- m 2 + • • • 4- = Af. 

^ The theorem itself and its proof given by Markoff can be extended to the case of 
series of unequal length. 
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Let the result of this summation be Sm. Then it remains to take the 
sum 

N 

ikr=o 

to have the desired expression E{Q). To this end we first separate two 
terms corresponding to M = 0 and M = N. In the former case 

mi = m2 = • • • = = 0 

and the probability of such an event is while Q = 1. In the latter 
case 

mi — m2 = * * * = mn = s 

the probability of which is while again Q = 1. Thus 

N-l 

E{Q) = pN qN ^ 

ikf=i 

To find Sm we observe that the denominator of Q has a constant value 
when summation is performed over variable integers mi, m 2 , , , . mn 
connected by the relation 

mi + ^2 + • • • + mn = M. 

Hence, it suffices to find two sums 

SP and SPmf 

extended over integers mi, m 2 , , , . mn varying within limits 0 and s 
and having the sum M. To this end consider the function 

y = {pte^^ + qy{'pte^^ + qY ’ ' ' (pte^^ + 

involving n + 1 arbitrary variables t, ^ 1 , ^ 2 , • . . ?n. When developed, 
y consists of terms of the form 

ppnv\-7nT^ • • • 

Evidently we obtain the sum hP by setting = f 2 = • • • = In = 0 
and taking the coefficient of in the expansion 

— f„=o = (pi + 

Thus 

To find SPmf take the second derivative 

dW 
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and after setting = ^2 == 


in = 0, expand 




and take the coefl&cient of Thus we find 


( 6 ) 2 


Pm| = 


(AT-l)! 


+ S(S— 1);; 


(iV-2)! 


Referring to (4), (5), and (6), we easily get 


•pMqN 


Si 


n{N - 1) 


(iV - 2)\N 


4nN — n + 


{n - 1)M{N - M) ' n{M - 1 ) \{N - M) I"' 

+ (iV — n){M — 1) — M{N — 


or, after obvious simplifications, 


N\ 


Hence 


■ - M<{N- iDr i''"- 

2 = (p + = 1 - 


iV~l 

s 

M = 1 


and finally 


E{Q) = 1. 


Markoff, using the same method, succeeded in finding the explicit 
expression of the expectation 

E{Q - 1)\ 

Since there is no difficulty in finding this expression except for some- 
what tedious calculations, we give it here without entering into details 
of the proof: 


E(Q - 1)2 = 


2NiN - n) 


N-l 


'M- 1 N-M -I 


(n-l)(M-2)(iV-3)^ M 

M = 1 


N-M 




whence the following inequality immediately follows: 

2N{N - n) 


E{Q - 1)2 < 


{n - 1){N - 2){N - 3) 
In case n ^ 5 a still simpler inequality holds: 

2 


( 7 ) 


E{Q - 1)2 < 
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Let B be the probability of the inequality 

where 6 is a positive number. Applying the same reasoning to inequality 
(7) as was used in establishing Tshebysheff’s lemma, we find that 

p ^ 2 

^ (n - l)*^' 

Likewise, denoting by R' the probability of the inequality 
we have 

^ (n - l)*^’ 

Thus, in a large number of series it becomes very unlikely that i.ne 
value of Q found in actual experiment would lie outside of the inter \^al 
1 — 1 + 6. For instance, the probability for Q ^ 2 in 100 series is 

surely less than 

99 

or nearly 0.02. However, this limit is much too high. It would be 
greatly desirable to have a good approximate expression for the proba- 
bility of either one of the inequalities 

Q^1 + € or — e. 

But this important and difficult problem has not yet been solved. 

5. In order to illustrate the foregoing theoretical considerations we 
turn to experiments reported by Charlier in his book “Vorlesungen 
liber die Grundziige der mathematischen Statistik^^ (Lund, 1920). He 
made 10,000 drawings of single cards from a complete deck of 52 cards 
(each card taken being returned before the next drawing), and noted 
the frequency of black cards. The drawings were divided into 1,000 
series of 10 cards, or into 200 series of 60 cards. The results are given 
in the tables on page 220. 

Assuming the independence of trials and the constant probability 
jP = /4j theoretical divergence coefficient must be 1. Let us compare 
it with the empirical divergence coefficient derived from Tables I and II. 
To this end we multiply the squares of numbers in the second column 
by the numbers given in the third column. The results are: 


For 200 series of 50 cards 
S(mi — psy = 2,487 


For 1,000 series of 10 cards 
X{mi — ps)^ ~ 2,419 
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Table I. — Ntjmbek of Black Cards in 
200 Groups of 50 Cards Each 


Frequency 

Difference 
m ~ 25 

Number of 
groups with 
these 

frequencies 

14 

-11 

1 

15 

-10 

0 

16 

- 9 

2 

17 

- 8 

2 

18 

- 7 

4 

19 

- 6 

8 

20 

- 5 

6 

21 

- 4 

15 

22 

- 3 

13 

23 

- 2 

15 

24 

- 1 

34 

25 

0 

14 

26 

1 

21 

27 

2 

26 

28 

3 

14 

29 

4 

10 

30 

5 

5 

31 

6 

5 

32 

7 

3 

33 

8 

2 


Table II. — Number of Black Cards in 
1,000 Groups of 10 Cards Each 


Frequency 

Difference 
m — 5 

Number of 
groups with 
these 

frequencies 

0 

-5 

3 

1 

-4 

10 

2 

-3 

43 

3 

-2 

116 

4 

-1 

221 

5 

0 

247 

6 

1 

202 

7 

2 

115 

8 

3 

34 

9 

4 

9 

10 

5 

0 


Dividing these numbers by 10,000 • }4c = 2,500, we get the following 
empirical divergence coefficients: 

D' = 0.9948; D" = 0.9676. 

Both are close to 1, so that the hypotheses of independence of trials 
and constant probability for each of them, are in good agreement with 
empirical results. The second divergence coefficient, corresponding to 
more numerous groups, differs from 1 more than the first, corresponding 
to only 200 groups. But such a difference can be accounted for by 
fluctuations due to chance. 

Series of 50 trials are long enough to test the theorem established in 
Sec. 2 of this chapter. The quantities denoted there by A and B are 
here correspondingly: 

A = 3.5263 

B = m; B - 2.805 


whence 


g = 1.2571 
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while 

^ - 1 . 2533 . 

Again the difference, only about 4.10“^, is rather small. 

In this example, the probability of drawing a black card was assumed 
to be In case we do not know the probability, but suppose it to be 
constant throughout 10,000 independent trials, we must consider the 
coefficient 


^ _ MY_ 

^ (n - 1)M(N - M)2 j[ ‘ 

5=>1 


In our example 


n = 1,000; N = 10,000; M = 4,933 


To evaluate the sum 


M 

s = 10; = 4.933. 


1,000 

/S = ^(nii- 4.933)2 

4 = 1 


we write it in the form 


jS 


Now 


1,000 1,000 

2) (mi - 5)2 + 0.134 2) (mi - 5) + 1,000 • (0.067 

4=1 4=1 

1,000 

^ (mi - 5)2 = 2,419 
1 

1,000 • (0.067)2 = 4.489 

1,000 

0.134 2 (m.- - 5) = -8.978 

1 


S = 2,414.51 

This is to be multiplied by the number 

n(N - 1) _ 1 

(n - 1)M(N - M) 2497.3* 

The result is 

0.9668, 

near enough to 1 for us to consider the hypothesis of independence of 
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trials and the constant value of probability as in agreement with experi- 
mental data. 


Examples of Dependent Trials 

6. So far we have dealt only with independent variables. But the 
law of large numbers holds, under certain conditions, even in the case of 
dependent variables. Leaving aside generalities, we shall show the appli- 
cation of the law of large numbers to a few interesting problems involving 
dependent variables. 

Let us consider first a Bernoullian series consisting oi n 1 inde- 
pendent trials with the same probability p for an event E, the opposite 
event being denoted by F. We associate with trials 1, 2, ... n variables 
0 ^ 1 , rr 2 , . . . Xn defined as follows: 

Xi — lii E occurs in trials i and i + 1, 

Xi = 0 in all other cases. 

The probability of = 1 evidently is when nothing is known about 
the values of other variables. But if we know that Xi^i = 1, which 
implies the occurrence of E in the ith trial, then the probability of == 1 
is p. Thus, consecutive variables are dependent. However, Xi and Xk 
are independent if |fc — > 1, as we can easily see. Since 

E{Xi) = E{xl) = • 1 ■+ (1 — 39^) . Q = p2 

the expectation of the sum + xg + * * • + will be 
E(xi + 0^2 + * * • + Xn) == 

As to the dispersion of this sum, it can be expressed as follows: 


n 



B„ = 


- p^y 

+ 2'^Eixi - 

- - pO- 



i = 1 


i>i 


Now 






(8) 

E{xi — 

= E{xX) - 

- 2p^E{xi) •+ 

■ = p%l — 

and 






( 9 ) 

E(xi — 

P^)iXi - 

p2) = 

E{xi — p^) 

O 

II 

1 

for; 

> i ■+ 1 because then 

Xi and Xj are independent. But 

( 10 ) 

E{xi — 


- p2) 

= E{xiXi+i) 

^ pi 


since the probability of simultaneous events 

Xi 1 , ™ 1 
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is pi Taking into account (8), (9), and (10), we find 

= np^q{3p + 1 ) - 2p^q 

and the condition 

Bn ^ 

—r — » 0 as n — ^ 00 

is satisfied. Hence, the law of large numbers holds for variables Xi, 
^ 2 , . . . Xn. To express it in the simplest form, it suffices to notice that 
the sum 

Xl + X2 + ' ^ + Xn 

represents the number of pairs EE occurring in consecutive trials of the 
Bernoullian series of n + 1 trials. Let us denote the frequency of such 
pairs by m. Then, referring to the law of large numbers, we get the 
following proposition: 

If in n consecutive pairs of Bernoullian trials the frequency of double 
successes EE is m, then the probability of the inequality 


will approach 1 as near as we please^ when n becomes sufficiently large. 

7 . Simple chains of trials, described in Chap. V, Sec. 1, offer a good 
example of dependent trials to which the law of large numbers can be 
applied. Let pi be the given probability of an event E in the first trial. 
According to the definition of a simple chain, the probability of E in 
any subsequent trial is a or /S according as E occurred or failed to occur 
in the preceding trial. By pn we denote the probability for E to occur 
in the nth trial when the results of other trials are unknown. Let 

8 = a - p = 1-5 

Then, according to the developments in Chap. V, Sec. 2, 

Pn = p + (pi - p)8’^-^, 

whence 

Pi + Pi + ■ ■ ■ + Pn , Pi — P 1 — S” 

barring the trivial cases 5 = 1 or 5 = —I. It follows that p represents 
the limit of the mean probability in n trials when n increases indefinitely, 
and for that reason p may be called the mean probability in an infinite 
chain of trials. When it is known that E has occurred in the fth trial, its 
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probability of occurring in some subsequent jth trial is given by 

== p + q = 1 - p. 

In the usual way we associate with trials 1, 2, 3, . . . variables 
xi, X 2 , Xzj . • . so that in general 


Xi = 1 when E occurs in the ith trial 

Xi = 0 when E fails to occur in the ith trial. 

Evidently 

E{xi) = E(xf) = Pi. 

In order to prove that the law of large numbers can be applied to 
variables Xi, Xz^ we must have an idea of the behavior of Bn 

for large n. By definition 

n 

= E{Xx — Pi + Xi — Pi + ■ • • + X„ ~ PnY = Y^E{Xi — PiY + 

+ 2^E[(xi ~ Pi)(Xj — p,-)]. 
j >i 

The first sum can easily be found. We have 

E(xi - pi)2 Pi - p\ = pg + (g - p)(pi - p)a»~^ - (pi - p)262^~2 

whence 

n 

A = ^E{xi — Piy^ npq 

i = l 

neglecting terms which remain bounded. As to the second sum, we 
observe first that 


E{Xi - pi){Xj - pi) = E{xixi) - pipj. 

Again, since the probability of 

XiXj — 1 

is evidently pip^f we have 

Eixixi) = pipf, 

and 


E{Xi - pi){Xi - Pi) = pijpf - pi) = pg^-^ + 

+ {Pi ~ V)(<1 “ - (pi “ p)2§i+^-2^ 

Now, for a fixed i = 1, 2, . . . — 1, we must take the sum of these 

expressions letting j run over i + 1, i + 2, . . . n. The result of this 
summation is 


5 

pg -f - s + - p) (2 - p)-y 


S” 


ipi — pys 





Sec. 8] APPLICATIONS OF THE LAW OF LARGE NUMBERS 


225 


Taking i = 1, 2, 3, . . . n — 1 and neglecting in the sum the terms 
which remain bounded, we get 

B = ^Eixi - pi)(Xi - p,-) npg—^ 

j >i ® 

whence 

Bn = A + 2B 

This asymptotic equality suffices to show that 
Bn ^ 

— —^0 as n 00 . 

Therefore the law of large numbers can be applied to variables Xi, 
X 2 , a? 3 , • . . . Since the sum 

+ ^2 + * * • + Xn — m 

represents the frequency of in n trials, the law of large numbers in 
this particular case can be stated as follows: For a fixed € > 0, no matter 
how small, the probability of the inequality 


Pi + P 2 4- • • 

• + Pn 

n 



tends to 1 as 71 00 . 

The arithmetic mean 


Pi + P2 + • ' • + Pn 
n 

itself approaches the limit p. It is easy then to express the preceding 
theorem thus: The probability of the inequality 


tends to 1 as n—> 

This proposition is of exactly the same type as Bernoulli’s theorem, 
but applies to series of dependent trials. 

8. Let a simple chain of iV == ns trials be divided into n consecutive 
series each consisting of s trials; also, let mi, m 2 , rrin he the fre- 
quencies of E in each of these series. When W is a large number, the 
mean probability in N trials differs little from the quantity denoted by p. 
It is natural to modify the definition of the divergence coefficient given 
in Sec. 3 by taking p instead of the variable mean probability in N trials. 
Thus we define 
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n 

(lUi — spy 


K- 


1 

Npq 


In our case, the variables 

Xi = (mi — spy, X2 = (m2 — sp)^, ■ ■ Xn — {rrin — spy 

are neither identical nor independent, although the degree of dependence 
is evidently very slight. These variables can also be presented in the 
form 


(11) {Xa — p + aJa+l — P + ■ • • + Xa+s-1 — py 

taking successively a = 1, s + 1, 2s + 1, . . . (n — l)s -[- 1. 

To find the mathematical expectation of (11) it suffices to notice that 

E{xi — py = E{xi - Piy + (p< — py = pg + (g — p)(Pi - p)5’~^ 
E{xi - p){xj - p) = E(xi - Pi){xi - Pi) + (Pi - pKPi - p) 

= pg5»-® + (pi — p)(q — p)S’-'^ 

and then proceed exactly as in the approximate evaluation of B„ in Sec. 7. 
The final result is 


E(Xa — P + Xa+1 - P + ■ ■ ■ 
_1 + 5 2pqS 


+ Xa+s-l — py = 


= spq- 


+ 


1 - 5 
2pq 


(1 - 3) 


(1 - 

.3«+i 


3)' 


+ 


(9 


(g - P)(Pi - P) (l + 3) 
(1 - 3)2 

- P)(Pi - P) 


50-1 ^ 


(1 - 3)2 


[2s(l - 5) + 1 + 5]3“+«-h 


For somewhat large s the two last terms in the right member are com- 
pletely negligible; so is the third term if a ^ s -f 1. Hence, with a good 
approximation. 


and 


E{X{) = spg^ - 
EiX,) = spg^ - 


2pqd 
(1 - 3)2 
2pqb 
(1 - 3)2 


+ 


(g - P)(Pi - p)(l + 3) 
(1 - 3)2 


if 


i > 1 


jf. _ 1 -f 5 25 I (7 - P)(Pi “ P)(l + 

1-5 s(l - 5)2 Npq{l - 5)2 

Again, when N is large, the last term can be dropped and as a good 
approximation to D we can take 


( 12 ) 


1-1-5 25 

1-5 s(l - 5)2' 


It can be shown that the law of large numbers holds for variables X\, 
Xi, . , , Xn and therefore when n (or the number of series) is large, the 
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empirical divergence coefficient is not likely to differ considerably from 
D as given by the above approximate formula. 

9. In order to see how far the theory of simple chains agrees with 
actual experiments, the author of this book himself has done extensive 
experimental work. To form a chain of trials, one can take two sets of 
cards containing red and black cards in different proportions, and 
proceed to draw one card at a time (returning it to the pack in which it 
belongs after each drawing) according to the following rules: At the 
outset one card is taken from a pack which we shall call the first set; 
then, whenever a red card is drawn, the next card is taken from the first 
set; but after a black card, the next one is taken from the second set. 
Evidently, these rules completely determine a series of trials possessing 
properties of a simple chain. In the first experiment the first pack 
contained 10 red and 10 black cards, while the second pack contained 5 
red and 15 black cards. Altogether, 10,000 drawings were made, and 
following their natural order, they were divided into 400 series of 25 
drawings each. The results are given in Table III. 

Table III. — Disthibution of Red Cakds in 400 Seeies of 25 Caeds 


Frequency of 
red cards, m 

Difference, 
m — 8 

Number of series 
with these frequencies 

1 

-7 

2 

2 

-6 

4 

3 

-5 

8 

4 

--4 

27 

5 

-3 

29 

6 

-2 

54 

7 

-1 

37 

8 

0 

52 

9 

1 

47 

10 

2 

44 

11 

3 

41 

12 

4 

20 

13 

5 

20 

14 

6 

7 

15 

7 

4 

16 

8 

3 

17 

9 

1 


The sum of the numbers in column 3 is 400, as it should be. Taking 
the sum of the products of numbers in columns 1 and 3, we get 3,323, which 
is the total number of red cards. The relative frequency of red cards in 
10,000 trials is, therefore, 


0.3323. 
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In our case 

a = i, = h ^ = i 

and the mean probability p in an infinite series of trials 

p - - 1 . 0,3333. 

Thus, the relative frequency observed differs from p only by 10"^ and 
in this respect the agreement between theory and experiment is very 
satisfactory. Now let us consider the theoretical divergence coefficient 
for which we have the approximate expression 

1 + 5 25 

^ 1-5 5(1 - 5)2* 

Here we must substitute 5 = 34 s = 25. The result is 
D = 1.631, approximately. 

To find the empirical divergence coefficient we must first evaluate the 
sum 

S = S(m - 

extended over all 400 series. For the sake of easier calculation, we 
present S thus: 

8 = 2(m - 8)2 - f5;(m - 8) + 

Now from Table III we get 

X(m - 8)2 = 3,521; S(m - 8) = 123 

whence 

S = 3,483.4. 

Dividing this number by 2000 ^^ = 2,222.2, we find the empirical 
divergence coefficient 

D' - 1.568 

which differs from D = 1.631 by only about 0.06, well within reasonable 
limits. 

10. In two other experiments two packs were used: one containing 
13 red and 7 black cards, and another 7 red and 13 black cards. In 
one experiment the pack with 13 red cards was considered as the first 
deck, and in the other experiment it became the second deck. The 
new experiments were conducted in the same way as that described in 
Sec. 9, but they were both carried to 20,000 trials divided into 1,000 
series of 20 trials each. In the first experiment, we have 

a = -l-f, 5 = -jV, P = I 

and 

D = 1.796, approximately. 
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while the same quantities for the second experiment are 

^ ^ ^ == if? ^ == p = I 

and 

D == 0.556, approximately. 

The results of these experiments are recorded in the following two 
tables: 


Table IV. — Concerning the First Experiment 


Frequency of 
red cards, m 

Difference, 
m — 10 

Number of series 
with, these frequencies 

2 

-8 

3 

3 

-7 

5 

4 


18 

5 


36 

6 

-~4 

59 

7 

-3 

93 

8 

-2 

103 

9 

-1 

117 

10 

0 

1 128 

11 

1 

121 

12 

2 

101 

13 

3 

93 

14 

4 

48 

15 

5 

39 

16 

6 

26 

17 

7 

7 

18 

8 

1 

19 

9 

1 

20 

10 

1 


Table V. — Concerning the Second Experiment 


Frequency of 
red cards, m 

Diflference, 
m — 10 

Number of series 
with these frequencies 

5 

-5 

2 

6 

-4 

10 

7 

-3 

48 

8 

-2 

112 

9 

-1 

193 

10 

0 

251 

11 

1 

201 

12 

2 

113 

13 

3 

56 

14 

4 

9 

15 

5 

5 
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Taking the sum of the products of numbers in columns 1 and 3, we 
find 

10,036 and 10,045 

as the total number of red cards in the first and second experiments. 
Dividing these numbers by 20,000, we have the following relative 
frequencies of red cards: 

0.50018 and 0.500225 

extremely near to p = 0.5. From the first table we find that 

S(m - 10)2 = g^924 


summation being extended over all 1,000 series. Dividing this number 
by 20,000 • 34 = 5,000, we find the empirical divergence coefi&cient in 
the first experiment 

D' = 1.785 


which comes close to 


D = 1.796. 


Likewise, from the second table we find 


i;(m - 10)2 = 2,709, 


whence, dividing by 5,000, 


D" - 0.5418 

again close to 

D = 0.5562. 


Thus, all the essential circumstances foreseen theoretically, for simple 
chains of trials, are in excellent agreement with our experiments. 

Problems for Solution 

1, From an urn originally containing a white and h black balls, n balls are drawn 
in succession, each ball drawn being replaced by 1 + c(c > 0) balls of the same color 
before the next drawing. If m is the frequency of white balls, show that the prob- 
ability of the inequality 

m a 

TT ^ ^ 

n a -f 0 

does not tend to 1 as increases indefinitely (Markoff, G. P61ya). 

Indication of the Proof. If Xi = 1 or Xi = 0, according as a white or a black ball 
appears in the ith drawing, we have 


E{Xi) = E(xl) = 
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Hence 



+ 372 "f- 


+ 


na V 
0 + 6 / 


n^ahc 


(a + b)^{a + 6 + c) 
-f 


+ 


nab 


(a + b)(a + b -j- c) 


2. Marbe’s Problem. A group of exactly m uninterrupted successes E or failures F 
in a Bernoullian series of trials with the probability p for a success is called an 
sequence.” If N is the frequency of m sequences in % trials, show that the probability 
of the inequality 


\N 

n 




< € 


for a fixed e converges to 1 as n becomes infinite. 

Indication of the Proof. Associate with each of the jjt = n — m 4^ i first trials 
variables Xi, X 2 , . . . Xp. assuming only two values, 0 and 1. For 1 < i < ju we set 
Xi - 1 if, beginning with the ith trial, a succession of m letters E oxF is preceded and 
followed by F or E. In all other cases Xi = 0 . We set = 1 if, beginning with the 
first trial, there is a succession of m letters E or F ended by F ox E; otherwise xi — 0. 
Finally, = 1 if , beginning with the juth trial there is a succession cf m letters E ox F 
preceded by F or F7, otherwise % = 0, Show that 

E{xi + ajo 4- • • • 4- Xfx) — (n — m — -4 4- 2{p^q 4- PQ^) 

E(xi 4- ^^2 4- • • * 4- Xfiy — n^{p^q^ -1- p’^q^Y + 


where P remains bounded. 

3. The following interesting series of dependent trials has been suggested by S, 
Bernstein: Two urns contain white and black balls. The probabilities of drawing 
white balls from the first and second urns are, respectively, p and p'. The probabilities 
of drawing black balls from the same urns are g = 1 — p and g' = 1 — p'. Finally, 
the probability of taking a ball from the first urn at the outset of the trials is a. A 
series of trials is uniquely defined by the following rule: Whenever a white ball is 
drawn (and returned), the next ball is drawn from the same urn; but when a black 
ball is drawn, the next ball is taken from the other urn. Let a-n be the probability 
that the nth ball will be drawn from the first urn when the results of other drawings 
remain unknown. Under the same assumption, let pn be the probability of the nth 
ball being white. Find general expressions of an and pn^ 

Hint: 

an+l — CLn{p 4 “ p' “ 1 ) + 1 •“ 

whence 


Also 

whence 


an = 



+ 


(“ 


2 



(p 4- p' - 1)"* b 


Pn = OnP 4 - (1 — an)p' 


Pn = 


p 4- p/ „ 2pp' 

2 - p - p' 


4 ' 


2 



(P - P'){p' 4- P 




4. When it becomes known that in the fth trial a white ball was drawn, what are 
the probabilities a^f^ and of taking a ball from the first urn in t]iejth{j > i) trial 
and of drawing a white ball in the same trial? 
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Hint: The probability that it was the first urn from which a white ball was 
drawn in the ^th trial is determined by Bayes’ formula : 


For n ^ i + 1 


whence 


„(.0 ^ „(.•) = ffP. 

“•+1 Pi 


42i = 4’^(p + p' -i) + i- p' 


.r> - + (^ - + f' - 

2 - p — p' \pi ^ — V — V ) 
for y > i + 1. Furthermore 

— df^p -h (1 — df^)p’ 

ioxj ^ t -f 1. 

6. From now on we shall assume p + p' = 1 or = g, g' = p. Show that the 
law of large numbers can be applied to variables Xi, x^, Xs, • • • which are defined in 
the usual way: 

Xi — 1 if a white ball is drawn in the ^th trial, 

Xi == 0 if a black ball is drawn in the ith trial. 

Indication of the Proof . Evidently £7 (xi) ~ E{xf) - pu Furthermore 

n 

- PiP + 2^EiXi - Pi)iXj — Pj), 


E{xi ~ Pip = 2p^(l — 2pq); i> I 
Eip^i — Pip = pq -]r a(l — Q:)(p — qp. 


For j > t > 1 


E{xi — pi){xi — Pi) - 0 if j > i + 1 
E{Xi ~ pi)(xi+i - pi+i) == pq{l - ipq). 

For i = 1 and j > I 

E{xi — pi)(xi - p,-) = 0 if j > 2 
E(xi — pi)(a:2 — P2) — ocp^ 4- (1 — — (1 — 2pq){q -f (p — q)a). 

Hence 

Bn ^ 4p^?(l — Spq)n 

and the law of large numbers holds. It can be stated as follows: If in n trials the 
frequency of white balls is m, then the probability of the inequality 

m 

(p2 _j_ g,2) g 

n 

tends to 1 as n tends to infinity for any given positive number e. 

6. Let r = p^ -i- q^ he the mean probability in infinitely many trials. Find the 
divergence coefficient 


^ (nii — sr)2 


J) ^ E — . 

Nr(l - r) 

when N = m trials are divided in n consecutive groups containing s trials each. 
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Indication of Solution. From the foregoing formulas it follows that 
E{Xa - r + Xa+i - r -j- . ♦ • 4- Xa+s^i r)2 = 4spg(l - Zj}q) - 2pg(l -- 4pg) 
if o > 1. Hence 


n 

E'^ (mi — sr)2 = 4:Npq{l — 3pg) — 4spg(l — 3pg) — 2(n — 1)m(1 ~ 4^9')* 

1=2 

Again 

E(mi — sr)2 == 4:spq{l — Spq) — 2p$(3 — lOpg) + p(l — 6^ + 12q^ — 4^^) — 

- a(p - g)(l - Spg) 

so that finally 

T) ^ ^ 1 , (p ~ l)(p - Q^)(l - 8p9) 

1 — 2pq s(l — 2pq) ^Npq{l — 2pq) 

For large N with a good approximation 

D = ^ ~ __ ^ 4P9 

1 — 2 ^ 0 ^ s(l — 2pq) 


7. Two sets of cards containing respectively 12 red and 4 black cards (the first 
deck) and 4 red and 12 black cards (the second deck) were used in the following experi- 
ment : The first card was taken from the first deck, and in the following trials, after 
a red card the next one was taken from the same deck, but after a black one the next 
card was taken from the other deck. Altogether 25,000 cards were drawn, and in their 
natural order were divided in 1,000 series of 25 cards each. The results are recorded 
in Table VI. How close is the agreement between this experiment and the theory? 

Table VI. — Distribution of Red Cards in 1,000 Series of 25 Cards 


Frequency of 
red cards, m 

Difference, 
m — 16 

Number of series 
with these frequencies 

6 

-10 

1 

7 

- 9 

1 

8 

- 8 

1 

9 

- 7 

12 

10 

- 6 

13 

11 

- 5 

43 

12 

- 4 

65 

13 

- 3 

92 

14 

- 2 

101 

15 

- 1 

162 

16 

0 

94 

17 

1 

164 

18 

2 

68 

19 

3 

no 

20 

4 

26 

21 

5 

28 

22 

6 

10 

23 

7 

7 

24 

8 

1 

25 

9 

1 
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Ans. In the present case p — p' = ^ = K* Mean probability in infinitely 

many trials: 

p2 + ^2 I 0.625. 

Theoretical divergence coefficient: D = 1.384. Frequency of red cards: 15,696. 
Relative frequency: 

mn = 0.62784, 

close to 0.625. 

Empirical divergence coefficient: D' = 1,3845, very close to 1.384. 

The probability of taking a card from the second deck is 0.25. Now, by actual 
counting, it was found that in 7,500 trials a card was taken from the second deck 
1,856 times. Hence, the relative frequency of this event in 7,500 trials is 

HU = 0.2475, 

again very close to 0.25. 
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CHAPTER XII 


PROBABILITIES IN CONTINUUM 

1. In the preceding parts of this book, whenever we dealt with 
stochastic variables, it was understood that their range of variation was 
represented by a finite set of numbers. Although, for the sake of better 
understanding of the subject, it was natural to begin with this simplest 
case, there are many reasons why it is necessary to introduce into the 
calculus of probability stochastic variables with infinitely many values. 
Such variables present themselves naturally in many cases of the type of 
Bujffon^s needle problem which we had occasion to mention in Chap. VI. 

On the other hand, even in dealing with stochastic variables with a 
finite, but very large number of values, it is often profitable for the sake 
of approximate evaluations, to substitute for them fictitious variables 
with infinitely many values. Among these the most important ones by 
far are continuous variables. 

Case op One Variable 

2. Beginning with the case of a single continuous variable Xj we must 
assume that its range of variation is known and represented by a given 
interval (a, b), finite or infinite. The knowledge only of the range of 
variation of x w'ould not enable us to consider a; as a stochastic variable; 
to be able to do so, we must introduce in some form or other the considera- 
tions of probability. For a continuous variable it is as unnatural to 
speak of the probability of any selected single value, as it is to speak of 
the dimension of a single selected point on a line. But just as we speak 
of the length of a segment of a line, we may introduce the notion of the 
probability that x will be confined to a given interval (c, d), part of (a, b). 

In introducing this new notion of probability in any manner whatso- 
ever, we must be careful not to fall into contradiction with the laws of 
probability which are assumed as fundamental. To this end, if P (c, d) 
is the probability for re to lie in the interval (c, d), we are led to assume 

r p{c, d) ^0 

2° Pia, b) = 1. 

The first assumption is an expression of the fact that probability 
can never be negative. The second assumption corresponds to the fact 
that X certainly assumes one out of the totality of its possible values. 

235 
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Next, if the interval (c, d) is divided into two adjoining intervals 
(c, e) and {e, d), we assume 

Z°P{c,d) =Pic,e) +P(e,d) 

in conformity with the theorem of total probability. 

For continuous variables it is furthermore assumed: 4° for an infini- 
tesimal interval (c, d), P(Cf d) is also infinitesimal. 

Properties 3° and 4° show that P(c, d) is a continuous function of c 
and d and that 

P(c, c) == 0. 

In other words, the probability that x will assume any given value is 0. 
At the same time P(c, d) represents the probability of any one of the four 
inequalities 

c < X < d; 0 ^ X < d; c < x ^ d; c ^ x ^ d, 

3 . A simple example will serve to clarify these general considerations. 
A small ball of negligible dimensions is made to move on the rim of a 
circular disk. It is set in motion by a vehement impulse and after many 
complete revolutions, retarded by friction and the resistance of the air, 
comes to rest. The variety and complexity of causes influencing the 
motion of the ball make it impossible to foresee the final position of the 
ball when it comes to rest and the whole phenomenon bears characteristic 
features of a play of chance. The stochastic variable associated with this 
chance phenomenon is the distance from a certain definite point on the 
rim (origin) to the final position of the ball, counted in a definite direction, 
for example, clockwise. This variable, when we consider the ball as a 
mere point, may have any value between 0 and the length of the rim. 
The question now arises, how to define the probability that the ball will 
stop in a specified portion of the rim, or else that the variable we consider 
will have a value belonging to a definite interval, part of its total range 
of variation. In trying to define this probability, we must observe the 
fundamental requirements set forth in Sec. 2. Besides that, we must of 
necessity resort to considerations which are not mathematical in their 
nature but are based partly on aprioristic and partly on experimental 
grounds. Suppose we take two equal arcs on the rim. There is nothing 
perceptible a priori that would make the ball stop in one arc rather than 
in another. Besides, actual experiments show that the ball stops in one 
arc approximately the same number of times as in another, and this 
experimental knowledge together with aprioristic considerations suggests 
the assumption that we must attribute equal probabilities to equal arcs, 
irrespective of the position of the arcs on the rim. As soon as we agree on 
this assumption or hypothesis, the problem becomes mathematical and 
can easily be solved. 
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Before proceeding to the solution, a remark on the meaning of zero 
probability in connection with continuous variables is not out of place. 
Zero probability in this case does not mean logical impossibility. We 
attribute zero probability to the event that the ball will stop precisely 
at the origin. However, that possibility is not altogether excluded 
so far as we consider the origin and the ball as mere points. The question 
lacks sense if we deal with a material ball and a material rim, no matter 
how small the former and how fine the latter. 

4. A stochastic variable is said to have uniform distribution of 
probability if probabilities attached to two equal intervals are equal. 
This means that P(c, d) depends only upon the length d — c = s oi the 
interval (c, d) and accordingly can be denoted simply by P(s). Com- 
bining two adjoining intervals of the respective lengths s and s' into a 
single interval of length s + a', according to requirement 3^, we must 
have 

(1) P(s + aO =P(a) +P(a'). 

Suppose now that the interval (a, h) of the length b — a = I, represent- 
ing the whole range of variation of x, is divided into n equal intervals 
of the length l/n. The repeated application of equation (1) gives 

P(!) - 

But by requirement 2° P(0 = 1 and hence 



Again, repeated application of (1) gives 


YmA ^ m 
\n / n 


for any integer m < n. Now let us take any interval of length a. For an 

wt 

appropriate m it will contain the interval —I and be contained in the 
interval 1; hence, referring to requirements 1° and 3°, we shall have 


while 


< p(s) < !!L±1 


^7 ^ .m + 1. 

—I ^ $ < L 

n n 
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m < s ^ m + 1 
n ~ 1 n 


Since P(s) and s/l are contained in the same interval of length l/n. 


Pis) - I 



and this being true for an arbitrary n, no matter how large, it follows that 


p(s) = f 

Thus for a variable x with uniform distribution of probability, the 
probability of assuming a value belonging to an interval of length s is 
given by the ratio of s to the length I of the whole range of variation of z. 

5. In the general case, when we cannot assume the uniform distribu- 
tion of probability throughout the whole range of variation of x, we let 
ourselves be guided by an analogy with a mass distributed continuously 
over a line. In fact, the distribution of a mass satisfies all the require- 
ments set forth for probability. In particular, the mass Am contained 
in an infinitesimal interval (z, z + Az) is also infinitesimal and the mean 
density 

Am 

Az 

is generally supposed to tend, with Az converging to 0, to a limit called 
‘'density at the point zN If this density p{z) is known, the mass con- 
tained in any interval (c, d) is represented by an integral 

J%(z)dz. 

Following this analogy we adnait that the mean density of probability 

Piz, z + Az) 

Az 

tends to a limit /(s) : density of probability at the point z when the length 
of the interval A^z tends to 0. Hence, again the probability corresponding 
to an interval (c, d) will be represented by the integral 

Pic, d) = £^fiz)dz. 

This expression satisfies all the requirements of Sec. 2 if the density of 
the probability /(a) is subject to two conditions: 
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(a) 

fiz) S 0 for all z in (a, b). 


ih) 

£fiz)dz = 1. 



The second condition implies, of course, the existence of the integral itself. 
But in all cases of any importance the density is continuous, save for 
discontinuities of the simplest kind which do not cause any doubts as 
to the existence of the above integral. 

From the general expression of P(c, d) it follows that for an infini- 
tesimal interval {z, z + dz) the probability is given hj J{z)dz neglecting 
infinitesimals of a higher order. For the uniform distribution of proba- 
bility over an interval of length I the density is constant and = 1 /I. 

In other cases we cannot expect to obtain a definite expression for 
density unless the variable itself is sufl&ciently characterized by addi- 
tional conditions, either hypothetical or implied by the problem. Thus, 
for instance, in applications of probability to problems of theoretical 
physics, the physicists have succeeded in obtaining definite probability 
distributions by invoking physical laws of admitted universal validity 
together with some plausible hypotheses. 

6. The interval containing all possible values of a stochastic variable 
may be finite or infinite according to the nature of that variable. How- 
ever, in all cases we may take the largest possible interval from — co to 
+ oo ; to this end it suffices to define the density outside of the originally 
given interval as being = 0. Then the density will be defined for all 
real values of z and will satisfy the conditions: 

(а) f{z) ^ 0 for all z 

( б ) 1 

Furthermore, the probability for a; to be in any interval (c, d) will be 
given by 

In particular, taking c = — <» and writing t instead of d, 

Fit) = f J(z)dz 

represents the probability that x will not exceed or will be less than t. 
Considered as a function of t, F{t) is never decreasing and varies between 
F(— oo) = 0 and F{+^) == 1. It is called the distribution function of 
probability.” In case x has uniform distribution of probability over an 
interval (a, b) its distribution function is evidently defined as follows: 
F(t) =0 for t < a 

Fit) = for a^t^b 

^ . h — a 

F(t) = 1 for t > b. 

Its graph is shown in Fig, 1 on page 240. 
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7. The definition of mathematical expectation can easily be extended 
to continuous variables; namely, the expectation of x or the mean value 
of X is defined by 

E(x) = j“jf{z)dz 

provided this integral exists. Similarly, the mathematical expectation 
of any function (p{x) is given by 

EWix)] = J_“^<p(z)f(z)dz. 

Of course, the existence of the integral in the right member is presupposed 
again. When this integral does not exist, it is meaningless to speak of 

the mathematical expectation of <p(x). 

/li mathematical expectation of the 

- cc d +00 power with positive integer exponent 

is called the moment of the order n or 
nth moment. We shall denote it by mn so that 

= j‘“j’'f{z)dz. 

The dispersion D and the standard deviation of x are defined in the same 
way as in Chap. IX; namely, 

D = = E(x — mi)2 = (z — m\. 

Often it is advisable to consider the mathematical expectation of |:r|“ 
where a may be any real number, ordinarily positive. This expectation 
is called the ^‘absolute moment of the order a.” Its expression is 

Ha = f” J[z\‘‘f{z)dz, 

and it is evident that 

m^k = ll%k] |^2Jfc-fl| S M2A:+1. 

The mathematical expectation of the function 

Qitx 

where t is a real variable, is of the utmost importance. It is called the 
“ characteristic function” of distribution, and is defined by 

<p{t) = ^e*‘^f(z)dz. 

Since /(z) ^ 0 and 

J'_"°^f(z)dz = 1 
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the integral defining <p{t) is always convergent and 

1^(01 ^ 1 . 

The distribution is completely determined by its characteristic func- 
tion. Because by the Fourier theorem 

= f(x) 

at all points of continuity of f(x). But the left-hand member is 

by the definition of <p{t) and so 
/(^) = 

8. To illustrate the preceding general explanations we shall now con- 
sider a few examples. 

Example 1 . Let a: be a variable with uniform distribution of probability over 
the interval (0, 1). The density of this distribution being constant 


the mean value of z is 


and the second moment 


Hence, the square of the 

(T^ = m2 — = — 

^ 12 

This simple example may be used to illustrate a remark made at the beginning of this 
chapter, that sometimes it is profitable to substitute for a variable with a finite but 
large number of values a fictitious continuous variable. Suppose that in flipping a coin 
n times, we mark heads by 1 and tails by 0, thus obtaining a sequence comprising n 
units and zeros altogether, disposed in the order of trials. This sequence may be con- 
sidered as successive digits in the binary representation of a fraction: 



standard deviation 




+ 2 = 

2 « 


contained between 0 and 1. X may be considered as a stochastic variable with 2” 
values each having the probability 1 /2". The probability II(a, (3) that X will be con- 
tained in the interval (a, /?), or more definitely that X will satisfy the inequalities 

a < X ^ ^ 
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I' 



is obviously obtained by multiplying the number of integers N contained in the limits 

2 ^a < N ^ 2«/3 

by 1 /2\ Now there are exactly 

[2^/3] - [2^a] = 2«(^ - a) d; -1 < 6 < 1 
such integers; hence 

IL{ay /3) = ^ - Qj 4- 

If n is even moderately large, this probability is very near to the probability 

Pia, — a 

that a fictitious variable x with uniform distribution over the interval (0, 1) will 
assume a value in the interval (a, jS). The first two moments of the variable X are, 
respectively 

0 4- 1 + 2 + • • • + 2 « - 1 1 1 

~ 22» "“2 2^+1 

02 4 P + 2^ 4- • • • + (2« - 1)2 1 1 

-^2 - 23n ”3 3 • 22«+i 


and differ little from the respective moments H and H of the fictitious continuous 
variable. Without losing anything essential, we here gain considerably in sim- 
plicity by substituting a fictitious continuous variable for the discontinuous variable 
X. 

Example 2. A thin bar can rotate freely about its middle point P, It is set in 
motion and after several revolutions comes to a stop pointing toward a point X on a 
^ line 1. The position of the bar is determined by an angle 0 

\ formed by itself and the perpendicular PO dropped from P on 1; 6 

varies between the limits ~7r/2 and ir/2 and its distribution is 

0 ^ ^ supposed to be uniform. The position of X is determined by 

PiQ, 2. distance OX - x from 0, this distance being positive or nega- 

tive according as X is to the right or to the left of the point 0. 
It is required to find the distribution of the probability of x. The relation between $ 
and X is 


X = a tg d 


a or, conversely, 


= arc tg — 
a 


By differentiation we find the relation between d9 and dx: 

adx 

dd 

a2 4a;2 

Now, by hypothesis, the probability that <OPX will be contained between 8 and 
0 4 dd iB 

dB adx 

Tc v 4 
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And the probability that the distance of X from 0 will be contained between x and 
cc + dx is the sanae. Hence, the density of probability for the variable a; is 

fiz) ^ 

and the probability corresponding to a finite interval (c, d) is given by 




For the whole range of variation of x 


1 p _ 

TT J_ ooCl^ 


as it should be. However, we cannot speak of the mean value of x or of moments of 
higher order, since the integrals 


^ 00 

J— 


7,2 -I- I „ / 7.2 4 - 7^2 


have no meaning. But the characteristic function <p(t) exists and is given by 


^ r — 


Example 3. One of the most important distributions (theoretically and prac- 
tically) is the so-called Gaussian” or “normal” distribution. The density of this 
distribution is given by 

f{z) = 

with three parameters K, h, a. However, only two of these parameters are inde- 
pendent, since we must have 

r fiz)dz =k{ = K f e-’^^'-du = = 1; 

00 00 — 00 ^ 


and finally 


To find the meaning of a and h we observe that the mean value of our variable is 


/(5,) = 




h T” 
«» 


{z — a)e + • 


~hHz-a)^dz = a 


X-"* ~ 


e-h\t-a)td,z 


= 0 . 
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Thus a has the meaning of the mean Talue of the normally distributed variable x. 
The square of the standard deviation is given by 


_ a)Hz 




whence 


Thus for the normally distributed variable with the mean a and standard deviation cr 
the density of probability is 

f(z) = — , 

(TV 27r 

Finally, for the variable u — x — a with the mean value 0 and the same standard 
deviation, the expression of density takes the simplest form 

1 

/(2) = — -7=e 

CT'V 27r 

and the distribution function of probability is represented by the integral 


= — f 


t 

e 


The curve of density 


1 -Jii 

.. = Q 2cr^ 

y “ / — ^ 

<r\/ 27 r 


or the probability curve has a bell-shaped form as shown in the figure corresponding 

to <r = 1. It has a single maximum corre- 

spending to a; ~ 0 and on both sides of this 

— 1 — maximum it rapidly approaches the x axis. 

The characteristic function of normal 
distribution has a very simple form. By 


definition 


= — f' 


we find that 


cos ^xdx = 


<p{t) =:= e 


(a > 0) 


The moments of normal distribution (with the mean = 0) can now be easily found. 
From the definition of the characteristic function it follows that 
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tn our case 



Thus 


m2i+i = 0 

ma* = 1 • 3 • 5 • • • (2A; - l)a^K 
Case of Two or More Variables 
9. By analogy it is easy now to extend the notion of probability to 
two or more variables considered simultaneously. A pair of special 
values Xj y of two stochastic variables X, Y will be represented geomet- 
rically by a point with the coordinates x, y referred to a rectangular 
system of axes. The domain & of all the possible values of X and Y will 
be represented by a portion (finite or infinite) of a plane with a definite 
boundary unless this domain coincides with the whole plane. The 
probability that the point x, y should belong to an infinitesimal area 
dxdy will be expressed by the product (p{Xj y) dxdy where the function 
(p{Xj y) is again called the density of probability at the point x, y. The 
density of probability must satisfy two requirements: it is non-negative 
in the whole domain S and 


/ J" y)dxdy = 


where the double integral is extended over all the domain S. The 
probability for the point x, y to be located in a given domain a is then 
given by the integral 

J* y)dxdy 

<T 

extended over cr. 

If (p{xj y) is a constant in S, the distribution of probability is called 
uniform. The domain S in this case must be finite and if its area is 
denoted by the same letter^ then 

<p{x, y) = 


The probability for the point x, y to be within the domain a* will be given 
by the ratio 

(T 

S 

denoting the area of the domain o- by a again. 
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10. We can always substitute the whole plane for the donaain B. 
To that end it suffices to set 


<p{x, y) = 0 

in all points not belonging to S. We shall then have 

cp(x, y) ^ 0 


everywhere and 

By doing so we have the advantage of stating results in a perfectly general 
form without mentioning the domain S. However, in dealing with 
particular problems, it is more convenient to consider only those points 
which can actually represent simultaneous values of the variables. 
The probability of simultaneous inequalities 

a < X < h; c < y < d 

according to the general definition is represented by the double integral 

y)dxdy. 

This corresponds to the compound probability of two events and we must 
see that the fundamental theorem of compound probabilities continues 
to hold. Taking c = — oo,d=:+oo the repeated integral 

pm:. y)dy 

represents the probability P(a, h) for the variable X (as if it were con- 
sidered alone without any reference to Y) to have its value in (a, 6). 
The function 


/W = y)dy 

represents the density of probability of X. Thus 

P(a, h) = pj{z)dx. 

In a similar way 

P{y) = y)'^^ 

represents the density of the probability of Y ; and the probability Q{c, d) 
that this variable has its value in (c, d) is given by 

Qic,d) = f'Fiy)dy. 
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Now the double integral 


XT <p{x, y)dxdy 


can be written in either of the forms 


where 


== jy(x)dx ■ j‘^Fi(y)dy 
J/)<Xa;d 2 / = j^F{y)dy • J’yi(x)dx 


Fi{y) 


y)dx 

J O' 


/i(^) = 




may be considered as densities of conditional probabilities, respectively, 
for F when it is known that X has a value in (a, h) and for X when it is 
known that F has value in (c, d). The preceding expressions for the 
probability of the simultaneous inequalities 


a < X < b, c < y < d 

have the same form as the theorem of compound probability and may be 
considered as its extension. The conditional probability for F to have 
its value in (c, d) when it is known that X has its value in (a, b) is given by 

fyi(y)dy. 

Now, we define variables X and F as independent when the proba- 
bility for F to be in (c, d) is not affected by the knowledge that X belongs 
to (a, 5), which means that 

ffFi(y)dy = £^F(y)dy 
or 

y)dxdy = j^F{y)dy ■ jy{x)dx 
and, since intervals (a, b) and (c, d) are arbitrary, 

‘Pi^-.y) K^)- F{y) 

at points of continuity. Hence, the density of probability for two 
independent variables is a product of a function of x alone by a function 
of y alone. Conversely, when this condition is satisfied the variables are 
independent. For independent variables the probability of the simul- 
taneous inequalities 

a < X <h 
c <y < d 
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has a simple expression 

£f{x)dx ■ £F{y)dy 

which is the product of the probability for X to have its value in the 
interval (a, h) by the probability for Y to have its value in the interval 
(c, d)y in perfect analogy with the compound probability of two inde- 
pendent events. 

Finally, the mathematical expectation of any function y) can be 
defined by 

y)) = LS. y)dxdy 

provided the integral in the right member exists. 

11 . It is hardly necessary to dwell at length upon the case of several 
stochastic variables. A system of particular values Xi, X2, . . . Xn of 
n stochastic variables Xi, X2, . . . Xn may be considered as a point in 
n-dimensional space. The density of probability is a non-negative func- 
tion (p{xi, X2, . . . Xv) defined in the whole space and satisfying the 
condition 



X2j . , , x^dxidx^ • • * dxn = 1 . 


The probability for a point representing Xi, X2, ... Xn to be located 
in a given domain cr is given by the integral 


S! ■ 


X 2 , • • • Xn)dXidX2 * . . dXn 


extended over cr. In the case of uniform distribution of probability, 
<p(xiy X2y s . . Xn) is by definition a constant in a certain finite region 
of space and =0 outside of that region. If V is the volume of that 
region and v the volume of the domain cr, the ratio 2;/F gives the proba- 
bility that a point belongs to o-. 

The probability of the simultaneous inequalities 


ai < rri < 61; a 2 < X2 < 62; ... an <Xn <hn 


is given by the integral 



X2y . > . Xn)dXidX2 ... dXn 


which, by introduction of the conditional probabilities as in the case of 
two variables, can be put into the form of a product of n integrals in a 
manner perfectly analogous to the expression of the probability of a 
compound event with n components. Finally, the variables are inde- 
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pendent if the density (p{xij X 2 , . . . Xn) is a product of n functions 
depending only upon xi, . Xn, respectively, and conversely. 

The expression 

El^(xi, X2, • • • Xn)] = J*_ ^ ^ ‘ ‘ j*^^-ip(pdXidX2 • * * dXn 

serves to define the mathematical expectation of any function ^(xi^ 

X2, • • • ^n) of Xij X2j • , , Xn. 

12. Since in introducing the extended idea of probability we took 
care to preserve the fundamental theorems of the calculus of probability, 
we may be sure that other theorems derived from them will hold for 
continuous variables. In particular, theorems concerning mathematical 
expectation and the fundamental lemma in Chap. X, Sec. 1, hold for 
continuous variables. Upon this basis as we have seen was built the 
proof of the law of large numbers. Hence, this important theorem 
applies equally to continuous variables. 


Geometeical Problems 

13. A few geometrical problems will afford a good illustration of the 
foregoing general principles. 

Problem 1. A rectilinear segment AB is divided by a point C into 
two parts AC = a, CB = h. Points X and Y are , , , , ■ 

taken at random on A C and CB, respectively. What is ^ x c y B 

the probability that AX^ XY^ BY can form a triangle? 

Solution. We must first agree upon the meaning of the expression 
'^at random.'^ The idea suggested by this expression implies that the 
^ way of selecting points X and Y gives no preference to 

any point of AC and CjB, respectively. Consequently, 
variables x = AX and y = BY may be assumed to have 
uniform distribution of probability. The domain of the 
point X, y is a rectangle OMPN with the sides OM — a, 
ON = h. In order that AX, XY, BY can form a triangle 
the following inequalities must be fulfilled: 


m 

N 


in 




S M 
Fig. 6. 


X < {a + 1) -- X ~ y) + y or x < a + h — x 

y< {a + h — X — y)-\-^ or y < a -- y 

— y<x + y. 


These inequalities are equivalent to 


X < 


a + & 
— , 


y < 


a + 6 


X + y > 


a A" ^ 
■“2 


To interpret them geometrically through P draw a line QPR making 
<i?QO = 45°. From the mid-point of QR drop the perpendiculars 
PiS, VW on OX, OF. Then the preceding inequalities limit the position 
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of the point x, y to the shaded area SVW, whose part TSU is contained 
in the rectangle OMPN, Variables x and y are independent and have 
uniform distribution. Hence, the density of probability of the pair 
X, y is constant and the probability that the point x, y is in the triangle 
TSU will be 

Area TSU __ Ihh _ I b 
Area OMPN ab 2 a 


At the same time this is the probability for AX, XF, BY to form a 
triangle. 

Problem 2. On a line AB two points Xi, X2 are taken at random. 
What is the probability that AXi, X1X2, can form a triangle? 



Fig. 6. Fig. 7. 

Solution. Variables AXi = Xi^ AX2 = x^ are independent and have 
uniform distribution of probability. The domain of all possible positions 
of the point .^i, X2 is a square with the side AJ5 == Z. Positions of this 
point when AXi, X1X2, X2S form a triangle can be characterized as 
follows. First, if Xi precedes X2, we have 


X2 — Xi < xi A- I X2 or X2 ^ 2 
Xi < X2 — xi + I — or Xi < I 

I 

I — X2 < X 2 — Xi -p Xi or 272 > 2 


which means that xi, X2 belongs to the triangle OPN, the definition of 
which is evident if L, M, N, P are mid-points of the sides of the square 
ABCD, Second, if Xi follows X2, we have 

I 

xi- X2 < 2 ; ^2 < 2 ^ ^2 

and these inequalities define the area OLM. Since the distribution of 
a;2 is uniform, the required probability is 

Area OLM + Area ONP _ III _ 1 
Area AHCD ll 

Problem 3. A chord is drawn at random in a given circle. What is 
the probability that it is greater than the side of the equilateral triangle 
inscribed in that circle? 
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Solution 1. The position of the chord drawn at random can be deter- 
mined by its distance from the center of the circle. This distance may 
vary between 0 and R, the radius of the circle. The chord is greater 
than the side of the equilateral triangle inscribed in the circle if its dis- 
tance from the center is less than Hence, the required probability 


Solution 2. Through one end of the chord, draw a tangent AT. 
The angle cp varying from 0° to 180° determines the position of the 
chord. If it is greater than the side of the inscribed equilat- 
eral triangle, the angle <p must lie between 60° and 120°. 



Hence the answer 


Fig. 8. 


P2 = 


120° - 60° 


180° 


1 

3 * 


The fact that we obtain two different numbers for the same probability 
seems paradoxical, and the problem itself is known as ‘^Bertrand^s 
paradox.^^ However, going attentively over both solutions, we discover 
that we are really dealing with two different problems. In the first 
solution it was assumed that the distance of the chord from the center 
has uniform distribution, while in the second solution the distribution 
of the angle (p w^as taken as uniform. The second solution may be con- 
sidered reasonable if a thin bar or a needle can rotate freely about A 
and if, being set in motion, it determines the chord AB by its ultimate 
position. On the other hand, the first solution is acceptable if a circular 
disk is thrown upon a board ruled with parallel lines distant from one 
another by the diameter of the disk. The intersection of the disk with 
one of the lines determines a chord, and the probability that it is greater 
than the side of the inscribed equilateral triangle can reasonably be 
assumed to be }/%. 

A general remark applies to all problems of this kind. When a 
certain geometrical element, such as a point or a line, is supposed to be 
taken at random, it should be clearly indicated by what kind of 
mechanism this is to be done. Only then the hypothetically assumed 
distribution can be put to an experimental test and either confirmed 
(approximately) or rejected. 

14. Btiffon^s Needle Problem. A board is ruled with equidistant 
parallel lines, the width of the strip between two consecutive lines being 
d. A needle so fine that it can be likened to a rectilinear segment of the 
length I < d is thrown on the board. What is the probability that the 
needle will intersect one of the lines (naturally not more than one) ? 

Solution. This is the oldest problem dealing with geometrical 
probabilities. It was mentioned by Buffon, the celebrated Trench 
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naturalist of the eighteenth century, in the Proceedings of the Paris 
Academy of Sciences (1733) and later reproduced with its solution in 
Buffon’s book “Essai d’arithm^tique morale,’^ published in 1777. 

Let us determine the position of the needle by the distance OP = x oi 
its middle point from the nearest line, and the acute angle <p between OP 
and the needle. Variables x and <p may be considered as independent. 
Furthermore, x and cp vary respectively between 0 and }idj and 0 and 
t/2. As a hypothesis we assume the distribution of probability for 


JC 



Fig. 9. Fig. 10. 


X and <p as uniform. The domain of Xj <p is & rectangle OABC with 
OA == 7r/2, OC = d/2. Now, the needle intersects one of the lines if 

X GOS <p 

and then the point x, (p lies in the shaded area below the curve 

I 

X — cos <P‘ 


Since the distribution of Xj <p is uniform, the required probability will be 

Area OAD 


Area OABC 


But 


I r I 

Area OAD 2 Jq “ 2 


Area OABC = ^ • J 

Z A 


and consequently 


rd 


On pages 112-113 an account was given of experiments made by several 
authors in connection with Buffon's problem. They all show good agree- 
ment with the theory and indirectly confirm the hypothesis assumed in 
deriving the above expression for probability. 

16. Extension of Btiffon’s Problem. A thin plate in the shape of a 
convex polygon, of dimensions so small that it cannot intersect two of 
the lines simultaneously, is thrown on a board ruled, as in Buffon'S needle 
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problem. What is the probability that the boundary of the plate will 
intersect one of the lines? 

Solution. Suppose that the polygonal boundary has five sides. 
Let these sides (and their lengths) be denoted by 


a, P, 7, €. 

Each of them is shorter than the distance d between two consecutive 
lines. On account of convexity, a line can intersect either none or two 
(and only two) sides. Accordingly, combining sides in pairs, we can 
distinguish 10 mutually exclusive cases and denote their probabilities by 

(al3)j (ay), (a8), (ae), (jSy), (^5), (^e), (y8), {ye), {8e), 

The required probability will be given by the sum 

p = (afi) + (ay) + (a8) + (ae) + (fiy) + (p8) + (/Se) + (yd) + 

+• (t^) + {8e). 


On the other hand, the side a can be intersected by a line in four mutually 
exclusive ways; namely, together with p or y, or 8, or e. Hence, if («) is 
the probability of intersection 


and similarly 


whence 


But 


ird^ 


(a) = (a^) + (ay) + (q:5) + (ae), 


(^) = (pa) + i^y) + {IS8) + (/3e) 
( t ) = ( 7 <^) + (yjS) + ( 75 ) + ( 7 €) 
(5) = {8a) + {8^) + {8y) + {8e) 
(e) = {ea) + (e^) + ( 67 ) + (e5), 


( 0 :) + (/5) + ( 7 ) + {8) + (e) = 2p. 





(e) 


ird’ 


and consequently 


a + /^ + 7 + ^ + € 
Trd 


P 

ird 


where P is the perimeter of the polygonal boundary. Evidently this 
result is perfectly general. Since it does not depend upon the number of 
sides, by passage to the limit, it can be extended to the case of a plate 
bounded by any convex curve. 

16. Second Solution of Buffon’s Problem. Barbier has given another 
extremely ingenious solution of Buffon’s problem and of its extension. 
Let f{l) be an unknown probability that the needle will intersect a line. 
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Imagine that the needle is divided into two parts V and Evidently a 
line intersects the needle if, and only if, it intersects either the first or 
the second part. Hence, by the theorem of total probabilities 

m =/(o+/(n, 

whence, as in Sec. 4, we conclude 


m = Cl 


where C is a constant independent of 1. The whole question is how to 
determine this constant. Barbier^s ingenious idea was to let this 
problem depend on the solution of another one: A polygonal line (convex 
or not) is thrown upon the board; what is the mathematical expectation 
of the number of points of intersection? The perimeter of the polygonal 
line can be subdivided into n rectilinear parts ax, a 2 , . . . a^ all less than 
d. With these n parts we can associate n variables :ri, , Xn^ such 

that 

= 1 if one of the lines intersects 
Xi === 0 otherwise. 

The sum 


5 = ^1 + ‘ • * Xn 


evidently gives the total number of the points of intersection. Hence 
- E{x^) + B(x2) + • * * + E{Xn) 

and, if pi is the probability of intersection of ai with one (and only one) 
line, 

E{xi) = Pi. 

But, according to the previous result, 

Pi = Cui. 

Hence, we have a perfectly general formula 

jB(s) = CipLi -f- a2 “•{■*** + dfi) ” CP 

where P is the perimeter of the polygonal line. The result holds for any 
curvilinear arc (closed or not) as can be seen by the method of limits. 
This formula applied to a circle with the diameter d gives 

C -wd = 2 

since such a circle has always exactly two points of intersection with 
the lines of the system. Thus we find that 
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and 

/(O = ^ 

as obtained before. For a closed convex line of sufficiently small dimen- 
sions only two cases are possible: two intersections (probability p), or 
none (probability 1 - p), whence E{s) = 2p and 


or 



V 


P_ 

ird 


in agreement with the result obtained in Sec. 15. 

17. Laplace’s Problem. A board is covered with a set of congruent 
rectangles as shown in the figure, and a thin needle is 

thrown on the board. Supposing that the needle is shorter 

than the smaller sides of the rectangles, find the probability 

that the needle will be entirely contained in one of the 

rectangles of the set. 

Solution. Let AB — a, AD = 6 be the sides of the rectangle which 
contains the middle point of the needle, the length of which is 

I (I < aj I < h). 


Taking AB and AD for coordinate axes, the position of the needle is 
^ determined by two coordinates y of its middle point 

and the angle (p formed by the needle with the x axis. 
jr> c consider x, y, <p as three independent variables 

^ with uniform distribution of probability. The domain 

^ filled up with all possible points x, y, is a 
Fig. 12. parallelepipedon 


0 < X < a; Q <y <h] — ^ ^ ^ 


and the distribution of probability throughout this domain is uniform. 
To characterize the domain of points representing positions of the 


M 



J 


Fig. 13. Fig. 14. 

middle point of the needle when it is located entirely within ABCD we 
consider the sections of that domain by planes p = constant and their 
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projections on the plane xy. These projections are represented by 
the shaded areas in Figs. 13 and 14 corresponding to positive and negative 
(f, respectively. 

In Fig. 13 

<PAB = cp] AP\\BF\\CR\\DG 

and AP = BE ^ BF == OR = DG = DH = |Z. 

Similarly, in the second figure 

<JAB = cp; AJ\\BQ\\CL\\DS 

and AJ = AK BQ = CL = CM = DS = 

The area of the rectangle PQRS corresponding to these two cases can be 
expressed as follows: 

Area PQRS — {a — I cos (p)(J) — I sin cp) — ah -- l{b cos <;;? + a sin + 

+ sin (p cos <pj 

Area PQRS {a — I cos ^)(6 + Z sin (p) = ah — l(b cos tp — a sin cp) — 

— P sin (p cos <p. 

Without distinguishing positive and negative values of we may write 

F{<p) = area PQRS ^ ah — hi cos (p -• Zalsin <p\ + |Z“|sin 2<^|. 

The volume of the domain representing positions of the needle entirely 
within ABCD is: 


while 


V = JF{(p)d(p = Tcah — 2hl — 2aZ + P 
~2 


V = 7ra6 


is the volume of the domain 

0<x<a, Q <y <1), 

Hence, the required probability is: 

, 2l{a + h)-V 

and the complementary probability for the needle to intersect the 
boundary of one of the rectangles is: 


Sec. 17] 


PROBABILITIES IN CONTINUUM 


257 


Buffon^s problem may be considered as a limiting case when a — oo 
and, indeed, by setting a = oo ^ we find that 

21 


in conformity with the result in Sec. 14. 

These examples may suffice to give an idea of problems in geometric 
probabilities. Sylvester, Crofton, and others have enriched this field 
by extremely ingenious methods of evaluating, or rather of avoiding 
evaluations, of very complicated multiple integrals. However, from the 
standpoint of principles, these investigations, ingenious as they are, 
do not contribute much to the general theory of probability. 


o 

Fig. 15. 


Problems for Solution 

1. A point X is taken at random on a rectilinear segment AB — I whose middle 

point is 0. What is the probability that AX, BX, and AO can form a triangle? The 
distribution of AX = a; is assumed to be uniform. A ns. 

2. Two points Xi, Xa are taken at random on AB = 1 . 

Assuming uniform distribution of probability, what is the mathe- A — B 

matical expectation of any power n of the distance between Xi 
and X2? 

Jo Jo I ' p 

3. Three points Xi, X2, X3 are taken at random on AB. What is the probability 
that X3 lies between Xi and X'a? 

Ans. assuming uniform distribution of probability. 

4 . A rectilinear segment AB is divided into four equal parts 

AC = CO ^ OD = DB. 


Ans. 


Supposing that the distribution of probability is symmetric with respect to 0, let P 
be the probability that a point selected at random on AB will be between C and D. 

Also, let Q be the probability that the middle point between 
j — COBB points selected at random will be between C and D. Prove 

1 4- 

Fig. 16. that Q > 

Ji 

Hint: The middle point of a segment X1X2 is surely between C and D if : (i) Xi 
and X2 are in CO; or (ii) Xi and X2 are in OD; or (hi) Xi and X2 are on opposite sides 
of 0. 

6. Two points Xi, X2 are chosen at random in a circle of radius r. Assuming 
uniform distribution of probability, what is the mathematical expectation of their 
distance? Ans. Denoting the required mathematical expectation by Af, we have 

X 2ir r2Tr 

I F(r, e, d')dede' 

where 

P{r, e, e') = + p’" - 2pp' cos (e - e')pp'dpdp'. 

Hence, varying r by dr 

dF == 2rdT -v/r^ -f- ~ 2rp cos (0 — d')pd(> 
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and 


d{7r^r^M) == 4:7rrdrJ^ 


-f* ~ 2rp cos capdpdca. 


By introduction of new polar coordinates the integral in the right member can be 
exhibited as 



Fig. 17. 


Thus 


whence. 



i^2r cos w 

I u^du 

Jo 



32 

COS^ OJOW = 

9 


d(3rr^M) = ^^r^dr 


M 


128r 

467r 


6. A board is covered with congruent rectangles as in Laplace's problem. A coin 
the diameter of which is less than the smaller side of the rectangles is thrown on the 
board. What is the probability that it will be partly in one rectangle and partly in 
another? Ans. a, 6, r being respectively the sides of the rectangles and radius of the 
coin, the required probability is 


2r (a + 6 - 2r) 
ah 

7. Solve Buffon's problem when the needle is longer than the distance between 
two consecutive lines. Ans. The probability for the needle to intersect at least one 
line is 

. 2<p(i 

p * -"^(l - sm (po) H 

Ta TT 

where <po is determined by cos <pq — d/L 

8. A board is covered with congruent triangles whose sides are a, 6, c. A needle 
whose length is less than the shortest altitude of any one of these triangles is thrown 
on the board. What is the probability that the needle will be contained entirely 
within one of the triangles? Ans. The required probability is 

I , + Cc^)l^ _ (4a + 46 4- 4c ~ Zl)l 

27r02 27rQ 

where Aj B,C are angles opposite to sides a, b, c and Q is double the area of the triangle. 
For equilateral triangles 



9. On each of the circles Oi, O2, O3, . . . with respective radii ri, rg, n, . . . 
points Mij Mif Mz, • . . are taken at random. Supposing that the series 

n 4- ra 4- rg + • * • 

i» divergent, while the series 

^ , 
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is coiiv6rg6n'tj prov6 that ths pi'obability that the Isngth of ths vector 

OM = OiMi + 0S7 + OzMz + • • • + Ojl^ 

will be > R tends to 0 as R — > w no matter how large ti is. 

Indication of Solution. Let x^, x^, . . . Xn, y,, y,, . . . p. be components of 
OMij OM2, . . . OMn on two rectangular axes OX, OY, Then 

E(xi) = E{yi) = 0 Mj 

E{x?) = E(yt) = W ^ ® 

2 Pig. 18. 

By Tshebysheff’s lemma (Chap. X, Sec. 1) the probabilities Q and Q' of the inequalities 


\xi + a ;2 + 
\yi 4“ 2/2 4“ 


4- ajnl > t 
4” 2/rt| > t 


4 


+ i + r + 


n +rl+rl+ . 


P 


^-4 


are both less than l/tK Now, if the length OM > R then either 

R 


\xi 4“ 2:2 4* 


4- Xn\ > 


-‘4 

1+J. + ■ ■ ■ +>i-f>:^-‘4' 


I 2 /: 


Hence, the probability P for the length of OM to be > J? is less than Q 4- Q'; 
that is, 




10. Prove that 


lim 

n*a» 00 


XX- 4': 


+ ^2 4- 


4- a;; 


Xi + X2 + 


4* Xn 


■dxidxz • • • dxn 


2 

3* 


Hint: Considering xi^ x^^ . . . Xn as continuous stochastic variables with uniform 
distribution over the interval (0, 1) prove with the help of Tshebysheff’s inequality 
that the probability of 


2 ^xl+xl-h • • • 4-a;| ^2 

— ^ ^ ^ 

3 4" 4“ * * * 4” 3:n 3 


for any € > 0 tends to 1 as n <» . 
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CHAPTER XIII 


THE GENERAL CONCEPT OF DISTRIBUTION 

!• In dealing with continuous stochastic variables we have introduced 
the important concept of the function of distribution. Denoting the 
density of probability by /( 2 j), this function was defined by 

= fl 

and it represents the probability of the inequality 

X < t 

For a variable with a finite number of values the function of distribu- 
tion can be defined as the sum 

F{t) 

Xi<t 

where pi, pa, . . . Pn are respective probabilities of all possible values 
Xi, X 2 , Xn of the variable x. The notation xi < t is intended to 
show that the summation is extended over all values of x less than t. 
Again, F(t) for any real t represents the probability of the inequality 

X < L 

In this case F{t) is a discontinuous function, never decreasing and varying 
between F{ — oo) = 0 and F(+co) = 1. Its discontinuities are located 
at the points xi, X 2 , ■ • • Xn and are such that 

Fixi + 0) — F{xi - 0) = piy 

denoting, in the customary way, 

Fixi + 0) = lim F{xi + €) 

F{xi — 0) = lim Fixi “ e) 

when 6, through positive values, converges to 0. To represent F{t) 
graphically we note that 



Fit) = 0 

for 

t < Xi 


Fit) = Pi 

for 

Xi< t < Xi 

F(t) 

- Pi + Pi 

for 

Xi <t < Xs 

pi 4“ p2 + 

• • • + Pn 

260 

for 

Xn tt 
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As for the value of F{t) at the point t == Xi, it is F{xi - 0). Hence, 
the graph of F(t) consists of rectilinear segments as shown in the figure 
(for n = 4; Xi = -2; 0^2 = 0; 0^3 = 1; 0:4 = 3; = pa == Ps = P 4 = H) 

and belongs to the so-called step lines. 

Thus, in case of a continuous variable the distribution function is 
given by an integral, and in case of a discontinuous variable, by a sum. 
In stating theorems equally true for continuous and discontinuous 
variables, it would be tedious always to distinguish these two cases. 
The question naturally arises whether it is possible to represent distribu- 
tion functions, moments, and ^similar quantities by using new symbols 
equally applicable to continuous and discontinuous variables. In a 
similar kind of investigation Stieltjes was confronted with the same 


“oo “2 d t 3 

Fig. 19 . 

difficulties and he succeeded in overcoming them by introducing a new 
kind of integrals known as Stieltjes^ integrals. 

Stieltjes’ Integeals 

2. Let (p(x) be a never decreasing function defined in the interval 
a ^ X S h- For any particular value of the argument both the limits 
(for € converging to 0 through positive values) 

lim (p{xo -|“ e) = <p{xo “b 0) 
lim (p{xo — e) = (p(xo — 0) 

exist. Since evidently 

(p{xq — 0) ^ (p{xq) ^ <p{xo + 0), 
xq is a point of continuity of <p(x) if 

<p(xo 0 ) = (p(xo + 0 ). 


If, however, 


(p{xo — 0) < (p{xq + 0) 


<p(x) is discontinuous at Xo, and the difference 


mo = <p(xo + 0) — <p(xq — 0) 

gives the measure of discontinuity or simply discontinuity. Since 
for any number of points of discontinuity xoj xi, . . . x^ the sum of 
discontinuities 

Wo + + * * * + S <pQ>) — <p(ci) 
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the points of discontinuity form a countable set. For there are only a 
finite number of discontinuities above any given number, so that, con- 
sidering the sequence 

d > di> d2> — • 

tending to 0, there is only a finite number of points with discontinuities 
>d; also a finite number of points with discontinuities ^5 and >di, 
and so on. It follows that points of discontinuity can be arranged into 
a single sequence and hence form a countable set. 

It may happen, however, that (p(x) may have discontinuities in any 
interval, no matter how small; but at any rate there are points of con- 
tinuity in any interval, If ^(a^o + «) > (pixQ — e) for all sufficiently small 
€ > 0 the point a^o is called a point of increase^^ of <p(x). In particular, 
any point of discontinuity is a point of increase. 

3. Let f{x) be a continuous function in the interval a ^ rr g 5. By 
inserting points Xi < Xz < ... < Xn this interval is subdivided into 
+ 1 partial intervals. In each of these we arbitrarily select points 
. . . In and form the sum ' 

S = /(|o)[^(^i) T(ci)] + /(li)[^p(^ 2 ) “ 4- . * . -f 

+ /(ln)[^(b) — 

It can be proved in the same way as for ordinary integrals that when 
all intervals 

Xi — Xz — Xij . , . h Xn 

tend to zero uniformly, the sum S tends to a definite limit. This limit, 
called Stieltjes’ integral, does not depend upon the manner of subdividing 
the interval (a, 6) or upon the choice of points |o, |i, . . . |n. It has 
a perfectly definite value as soon as f{x) and <p{x) (together with a, h) 
are given, and accordingly is denoted by 

j‘y(x)d<p{x). 

In case <p{x) has a continuous derivative, d<p(x) can be interpreted 
as the ordinary differential; Stieltjes' integral then coincides with the 
ordinary one. In other cases d(p{x) is a new S 5 nnbol introduced as a 
reminder of the origin of Stieltjes' integral. In particular, if <p{x) is a 
step function with discontinuities pi, P 2 , ps, . . . at the points a; i, 
Xz, Xg, . . . , Stieltjes' integral coincides with the sum 

^Pif(xi) 

which is a finite sum or an absolutely convergent infinite series according 
as the set of points of discontinuity is finite or infinite. 
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Stieltjes’ integrals possess many properties of ordinary integrals. 
For instance, the mean- value theorem holds for them in the form: 

£Kx)d<p(x) = myib) - <p{a)] 

where a ^ ^ ^ 6. Also, if f{x) has a continuous derivative, we have an 
analogue for the integration by parts 

f(x)d<p(x) =f(b)<p(b) — f{a)<p{a) ~ j'\(x)df{x) 

where df(x) means an ordinary differential and the integral in the right 
member is an ordinary integral. However, some important properties 
of ordinary integrals do not hold universally for Stieltjes' integrals. For 
instance, considered as functions of b or a, they may have discontinuities. 

In the definition of Stieltjes' integral it was assumed that a and b 
were finite numbers. Stieltjes' integral over the interval — co ^ 4-00 is 
defined in an ordinary way as being the limit of 

jy{x)d<p(x) 

when a and b tend independently to — 00 and + , respectively. In 

other words, 

J f{x)d<p(x) = lim J^f(.x)d(p(x) when a-^— 00, 5— >400, 

provided this limit exists. If it does not exist, the symbol 

f_’‘j(x)d<p(x) 

has no meaning. 

The General Concept of Distribution 
4 . The most general t3T)e of distribution function of probability, 
covering all imaginable cases, is given by a never decreasing function 
F(t) defined for all real values of i and varying from F( — <x>) — 0 to 
F(-l-oo) == 1. If at points of discontinuity we set 

F(t)=F(t- 0 ), 

then for any t the probability of the inequality 

X < t 

will be given by FCt). Also, the probability of the inequalities 

^ X <. h 


will be 


F(h) - Fih). 
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The case of continuous F(0, having a continuous derivative f{t) 
(save for a finite set of points of discontinuity), corresponds to a con- 
tinuous variable distributed with the density /(O, since 


F{t) = p_J{x)dx. 


If Fit) is a step function with a finite number of discontinuities, it charac- 
terizes the distribution of probability of a variable with a finite number 
of values. Finally, if Fit) is a step function with an infinite set of dis- 
continuities distributed without density, it corresponds to a variable 
whose values can be arranged in a sequence according to their magnitude. 
These are the most important types of variables considered in the 
calculus of probability, and for all of them the distribution function can 
be represented by Stieltjes^ integral 


F{t) = p JFix). 

The mathematical expectation of any continuous function f(t) is 
defined by Stieltjes^ integral 

Eim) = p\mdF(t) 


provided it has a meaning. In particular, moments of the order n (n 
positive integer) and absolute moments of the order a (a real) are defined, 
respectively, by 


and we always have 
Finally, 


— j^J'^dFit) 
fxcc == f_"ji\'"dF(t) 

\mn\ ^ 


(fit) ~ ^e^^^dF (x) 


is the characteristic function of distribution. Since the integral exists 
for any real t, this function is defined for all real values t and satisfies the 
inequality 


Inequalities foe Moments 

6. Moments of any distribution satisfy certain inequalities, which 
it is important to know. They all are particular cases of the following 
very general inequality due to Liapounoff . 
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Liapotmoflf’s Inequality. Let a, 6, c be three real numbers satisfying 
the inequalities 

0^ ^ h > c ^ 0 

and fiaj Me absolute moments of orders a, 6, c for an arbitrary distribu- 
tion. Then the following inequality holds: 

Proof, a. Let pi, p 2 , . . . Pn] Xi, rc 2 , . . . Xn be positive numbers 
and 

<p{a) = pixt + p2Xt + . . . + 

Then for arbitrary real numbers Si, S 2 , . . . Sj, the following inequality 
holds: 


(1) ^ ±£3>^ g <p{si)<pis2) - • • ^(Sy). 

For p = 2 this inequality follows immediately from the known inequality 
due to Cauchy: 

( n \ 2 n n 

by taking in it 

«1 £2 

di = ^\/piX?^ hi = \/piXt^. 

For p = 4 we have 

/"si + 52 + S 3 + fsi + S 2 V As + ^ r \ r \ r \ r \ 

n 4 / - n — 2 — yH — 2 — ) = 

and continuing in the same manner we find in general that 




5l + 52 + * 


+ 521 


) 2« 

^ <pisi, 


)<P(S2) 


(p(S2m). 


Let m be taken so that 2^^ > p and let us take in the last inequality 

5i + 52 + • • * + 5ff 


Since 


5^4-1 — 52J4.2 


5i + 52 + 


S2w* ““ 5 


p 


+ ^ 2 «» „ + {2^ — p)s 


we shall have 

(pisY”" g • • • ^^(5p)^(s)^’”'~^, 
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whence 

<p{sy ^ ^(si)^(s2) • • • ip{sp)i 
which is inequality (1). 

&. Let a ^ ^ c ^ 0 be integers. Taking p = a — c; si = S 2 = 

» • 0 zss Sa—b ~ Cj s<x— 6-j-i ... =: ^ w^c have 

gi + ^2 + • • • + Sg^c _ (g 6)c + (6 — c)a _ 
a — c a -- c 


and consequently, by virtue of (1), 



If a = p/s, b = q/s, c = r/s are rational numbers (a ^ ^ c ^ 0), 

1 

it suffices to take, in (2), p, q, r instead of a, b, c, replace Xi by x% and 
raise both members to the power 1/s to ascertain that (2) holds for 
rational a, b, c. Finally, the passage to the limit makes it clear that (2) 
holds for real a, 6, c, provided a ^ 6 S c ^ 0. 

c. Let the interval 4 to JS be subdivided into partial intervals by 
inserting numbers < ^2 < * • * < tn between A and B and let 

Po = F{t^) -- F{A), Pi = Fik) - F(h), . . . pn = F(B) ~ FiQ 

Xo = \A\, Xi - 1^x1, . . . Xn = ii»|. 

Then the three sums 

n n n 

^PiA, %PiXi, %PiA 
0 0 0 

will tend to the respective limits 

fym), f/WdFit), f/WdFit) 

when all differences A -- h, 1% h, ... B — tend to 0 uniformly. 
Hence, passing to the limit in (2), we get 

i <J‘wm)y ■ (JTKNf (<))'"■; 

and finally, letting A tend to — oo and JS to + 

or 

jug-® g 
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as stated. 


a c 


Taking b = ^ , Liapounofif’s inequality becomes 


whence 


Ma±c ^ ^ 

2 


a—c a—c 
2 

a f 


Ma+c ^ AtoMo 


for any two real positive numbers a and c. If k and I are two positive 
integers and we take c = 2k, a = 21, then 


or 

since 


mI+1! ^ P’ikfiil 


^ mimn 


|wfc+i| g ^*+2 and /i 2 j! = mik, y-n = »» 2 !. 


Another important inequality results if we take c = 0. Then, since 

Mo = 1, 


or 


M? ^ Mo 




if a > 6 > 0. This amounts to 

log M6 log /Xg 
5 “ a 


if a > b 


which is equivalent to the statement that 

log ;xa> 


is an increasing function of x for positive x. 


Composition of Distribution Functions 
6. An important problem in the calculus of probability is to find the 
distribution function of the sum of several independent variables when 
distribution functions of these variables are known. It suffices to show 
how this problem can be solved for the sum of two independent variables. 

Let X and ^ be two independent variables with the corresponding 
distribution functions F{t) and Gif). To find the distribution function 
H (0 of their sum 


x-^y 
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is the same as to find the probability of the inequality 

x + y <t 

for an arbitrary real number t. Here, for the sake of simplicity and in 
view of the applications we propose to consider later, we shall assume that 
one, at least, of the variables rr, y has continuous distribution with 
generally continuous density. 

At first, let both x and y have continuous distributions so that 
F{t) = p_J{x)dx-, G(t) = p_j{x)dx. 

The probability of the inequality 

X + y <t 

according to the general principles stated in Chap. XII is expressed by 
the double integral 

Hit) = f ff(x)g{y)dxdy 
extended over the domain 

X + y < t. 

Now, following ordinary rules, we can reduce this double integral to a 
repeated integral. To this end, for any fixed x we integrate g(y) between 
limits ~ 00 and t — x, thus obtaining 

f_ y(y)dy = G{t - x). 

Then, after multiplying by f(x), we integrate the resulting expression 
between limits — oo and + oo for x. The final result will be 

H(t) = J_ Jj{t — x)f{x)dx 

or, written as Stieltjes^ integral, 

H{t) = fLG(i - x)dF(x). 

In the second place, let a; be a discontinuous variable with different 
values xi^ X2, Xz, . • • and corresponding probabilities pi, p2, pz, • ^ • 

For X = Xi the inequality 

x + y <t 


is equivalent to 


y <t — Xi 
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and the probability of this inequality is G{t — x^. Since the probability 
of X = Xi is pij the compound probability of the two events 


X = Xi 

X + y <t 

will be 

p^G(t - Xi). 

The total probability H(t) of the inequality 


X + y <i 

will be expressed by the sum 

H{t) = 'Lpff{t — Xi) 

extended over all possible values of x. But this sum can again be written 
as Stieltjes^ integral: 


(1) Hit) = - x)dFix). 


In both cases we obtain the same expression for E{t). Evidently 
H{t) can also be defined as the mathematical expectation of G{t — x ) : 

H{t) = E[G{t - x)} 


taken with respect to the variable x. The important formula (1) is 
known as the formula for composition of distribution functions F{t) 
and(?(0- 


Example. Let x and y be two normally distributed variables with means = 0 
and respective standard deviations <ri and a-^. Instead of using (1), it is better to 
write H{t) as a double integral 


E{t) = 


1 

27r<ri£r2 



e 


2<ri2 


extended over the domain 

X A- y <t. 

To evaluate this integral, it is natural to introduce x y == as a new variable and 
find constants C, D, a, so as to have identically 


Zo J Zo " 2 


whence one easily finds 


C - 


2(crl+<rl) 
a = <r5. 


D == 


2cryM + <rl) 




and 


x^ 


+ 


2/2 


2(rf 2<rl 2(<r] 
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The Jacobian of 


, <^1 

2i=a; + 2/j u = —X ?/ 

O'! 0*2 


with respect to a;, y being 


1 1 

<7*2 <7*1 

CTl 0-2 


^2 ! ^2 
0*1 4- OTg 


i? (0 can be presented as the double integral 


1 C C 


with the domain of integration defined by a single inequality: 


Hence, 


or 


since 


z <t. 


H(t) = 


1 


27r(<ri + 






= 


V2x(o-?+»-i) 




2{tri2+o-22)^2^ 



2(<n^+<r2^)dy^ ^ ^2ir(pi + a^. 


The expression obtained for H(t) leads to a remarkable conclusion: 
The sum of two normally distributed variables with means = 0 and 
standard deviations cri and <r2 is also a normally distributed variable with 
the mean = 0 and the standard deviation a = -x/af + If means 
of X and y are ai and a2, then evidently z will be normally distributed 
with the mean a = Ui + «2 and the standard deviation <r = + <7'|. 

Repeated application of this result leads to the following important 
theorem : 

If Xij X2, . . ^ Xn are normally distributed independent variables with 
means ai,a2, . . . an and standard deviations a ij<x2i . . . am then their sum 


Z = Xi+ X2+ ^ + Xn 

is again normally distributed with the mean a = ax + a2 + • • • + Uw 
and the standard deviation a = -x/af -f <t| + • • * + 

Finally, any linear function 

w = ClXi + C2X2 + ‘ ‘ * + C A 

is normally distributed with the mean a = Ciai 4* C2a2 + • * - + 
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and the standard deviation tr = -v/cfo-! + c|o-| + ■ • • + 
particular, the arithmetic mean 

^1 + a ;2 + • • • 4 ~ ^71 
n 


of identical normally distributed variables with the mean a and the 
standard deviation cr is normally distributed about the mean a and with 
the standard deviation af y/n. Hence, the conclusion may be drawn 
that the probability P of the inequality 


is given by 


Xl+ X2 + ■ ■ 

■ + 

n 



P = 





— r 

V^Jo 


«\/ n 
<r 


e ^dt 


and rapidly approaches 1 as n increases. This is a more definite form 
of the law of large numbers applied to normally distributed (identical or 
equal) variables. 


Determination of Distribution When Its Characteristic Function 

Is Given 

7. One of the most important conclusions to be drawn from the 
preceding considerations is that the distribution function of probability 
is uniquely determined by the characteristic function. The known 
proofs of this fact are rather subtle, owing to the use of conditionally 
convergent integrals. However, such integrals can be avoided by resort^ 
ing to an ingenious device due to Liapounoff. In the general case, the 
distribution function of a variable x has discontinuities. To avoid the 
bad effect of these discontinuities, Liapounoff introduces a continuous 
variable y that, with reasonable probability, can have values only in the 
vicinity of 0. It may be surmised, therefore, that the continuous 
distribution function of the mm x + y will approximately represent that 
of X and, by disposing of a parameter involved in the distribution function 
of y, will tend to it as a limit. To make these explanations more definite, 
let y be a normally distributed variable whose distribution function is 


G{t) 


= _A.f 


_£? 

e 


When h is small, the probabilities of any one of the inequalities 

y > f, y < —6 
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will be extremely small aad even will tend to 0 when h tends to 0. Hence, 
the distribution function H{t) of the sum x + y is likely to tend to 
F{t) as a limit when h tends to 0. • * 

To prove this in all rigor, we apply the composition formula (See» 6) 
to our case. We obtain the following expression for H{t): 


1 f* » -5* 

— P I dF{z) I e h'Hiz 
'VrJ-^ J-«, 


or, in more convenient form 


H(t) = 




e~^^du; 


and furthermore, integrating by parts, 

Hit) = -Ip f " Fix)dx. 

h's/ tJ - CO 

The integral in the right member can be split into three parts 
- — 7 = I e ^ ^ ^ F(x)dx d j e ^ ^ ^ F(x)dx + 


Fix)dx + -Ipz r 


i_ r- 

V^J-. 


F{x)dx, 


Now, for positive T 


1 r* 1 

—7= 

■VttJt ^ 


Making use of this inequality, we find that 


i_rv(¥) 


F{x)dx < 




a/tt L ^ 


and similarly 


so that 


1 

aAJ- » 


-(Lz^y 1 .« 

3 V A / F{x)dx < jre A», 


e ^Wit -]r v)du + 


1 r‘ - 


e — u)du + 6e v, 


Q <e <1. 


Given an arbitrary v > 0, the number e can be taken so small that 

0 g Fit + u) - Fit Q) < c 
0 Fit - Qt) - Fit - u) < c 
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for 0 < < e, whence 


-f 

h\/TrJo 


e ’‘^F(t + u)du — 


F(t + 0) 


-J) r> 

^ Jo 


V Ce-rnt - u)du - ^ f 

V^Jo Vtt Jo 


e~^^du 


and 


H{t) 


h\^ 


_ F{t + 0) + Fit - 0) 


e~'^^du 


< <T 


< <r 


"v/x 

On the other hand, 


+ 9' (2(7 + e '■0; \e'\ < 1 . 


— f 

aA- Jo 

so that finally 


e~^^du = 


^ 1 ]_("• 

2 V^i 


J a" 
2 2 ' 


e““yM = ^ — A® 0 < 5" < 1, 


H(t) 


F{t + 0) + F(t - 0) 


< 2(7 4- 2e 


and for all sufficiently small h (« being kept fixed) 


Hit) - 


Fjt + 0) + Fjt - 0) 


< 4cr; 


that is, 


lim H (t) = 

A->0 


Fit + 0) + Fit - 0) 


or, if i is a point of continuity, 

lim Hit) = Fit). 

h-*0 

Now we must find another analytical representation for Hit). To 
this end we consider the difference 

t — x 

Hit) - Hid) = ^ dFix) f '* e-'^^du, 

V’tJ-oo 

h 

and, to represent in a convenient way the inner integral, we make use 
of the known integral 


i- f" 
2v^J_. 
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Multiplying both sides by du and integrating between — ^ and — — 
we find 


- s,j. 

h 


1 f* » 1 

- t p 4: 


tv 


-dv 


and 


Hit) - H{0) = ^j“jiF{x) J\ 


7t2p2 


..a 


tv 


-dv. 


The next step is to reverse the order of integrations, an operation 
which can be easily justified in this case. The result will be: 


H{t) - H{0) = ^ 


00 


i _ p-ivt 

- — — dv e^-dFix) 

tv J_ « 


or 


since 


H{t) - H{D) 


= Jl f" 

J oo 

piv 


1 _ p-ivt 

4 ^( 2 ;) : dv 

tv 


'^dFix), 


Now, taking the limit of H (t) for h converging to 0, we have at any point 
of continuity of F{t) 

^2y2 

(2) F{t) = C + ■ 


s 1*.”; X 


1 p—ivt 

4 ^( 2 ;) : dv 


tv 


where the constant 


^_ F(+Q) +F{^Q) 
2 


is determined by the condition jP(“Co) = 0. Thus, the distribution 
function is completely determined by (2) at all points of continuity when 
the characteristic function ip{v) is given. 

Example 1. Let us apply (2) to find the distribution corresponding to the 
characteristic function 

<rH^ 

<p{v) = e 2 , 

Since in this case the integral whose limit we seek is uniformly convergent with 
respect to /i, we find simply 


F(f) = C 4- 


= e + 


sX.*"'’ 

if, 

27rJ- 00 


-dv 


IT Sin tv , 

^ ^2?, 
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On the other hand (Chap. VII, page 128), 


so that 


J- « V (T Jo 


Fit) =x C 


1 —Jfi 1 

^ 2or2dM + = I 

<rV 2 tJ- m <rV 2'kJ- «, 


e 

vi^, 


e 2cr^du, 


Taking ^ == — oo, the condition F( — ^) =0 gives 


and so finally 


1 _Jf: 

(J ^ ^ I Q 2ff^du^ 

<T'\J 2tJ _ OO 


Fit) = ■ 


-v/2^, 


__W; 


2a2du. 


Naturally, we find a normal distribution with the standard deviation cr (compare page 
270). 

Example 2. What is the distribution determined by the characteristic function 
^iv) — a > 0? 

A.S in the preceding example we find that 


Fit) = C + 


But 


whence 


Thus 


1 f , i Sin tv ^ ^ . 1 I sm tv ^ 

— I ^-a\v\ = (7 - I di 

2 tJ-~ 00 V nr Jo "0 

d C sin tv f a 

dtjo V Jo 

1 r sin , a dx a dx 1 

_ I ^-av = - I ; =:= _ I 

ttJo V ttJo (P" 4 * '^J— 2 

- 

2 nrj — »«' 


Fit) - C 


dx 
• + 


and the condition i?’(— oo) =0 gives C - }4, ao that finally 

dx 


Fit) 


-“f - 

nrj—ooa^ 


4- x^ 


Naturally we find the same distribution as that considered in Example 2, page 243. 
Sometimes it is called ^'Cauchy’s distribution” with the parameter a. 

Composition of Characteristic Functions 
8 , Having n independent variables Xij 0:2, .. . Xn whose charac- 
teristic functions are <pi{t) , (P2{t) , . . . <pn(t)) the product 

<p(t) = <piit)<p 2 (t) • • * (pnit) 
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is the characteristic function of their sum 


s = a;i + a;2 + • • • + 

In fact, the characteristic function of s is by definition 

<p{{) = 

Since Xi^ X 2 , . . . Xn are independent variables, the expectation of the 
product 


is equal to the product of the expectations of the factors, whence 

(p(t) = <Pl(t)<p2(t) * * * (pnit). 

This simple theorem is of great importance since it determines the 
characteristic function of the sum of independent variables and indirectly 
its function of distribution. 

9. A few examples will illustrate the preceding remark. 

Example 1. Consider n independent normally distributed variables rci, X 2 f . . . Xn 
with means = 0 and standard deviations o-i, 0 * 2 , .. . <rn. Their characteristic func- 
tions are 

<rkH^ 

<Pk{t) = e 2 j ^ = 1, 2, . . . n 
and the characteristic function of their sum 


will be 


s ^ xi -{-x^ A- 


= e 


4 “ Xn 


" 2 


where 


Hence 


0-2 = crj _j_ ^2 ^ . - 1 - 0 . 2 ^ 

j is a normally distributed variable with the mean 0 and the standard deviation 


0* =r s/ cr\ 0*2 + * • • + 0 -n 

as we found previously by a method involving a considerable amount of calculation. 

Example 2. Independent variables xi, x^, . . . Xn have Cauchy’s distributions 
with parameters ai, a 2 , . . . an. Since the characteristic function of Xk is 

the characteristic function of the sum 


will be 
where 


5 === a?! 4* a;2 +• • • • +• iCn 

^(i) == 


a “ ai -b a 2 ■+" • * * 4" a». 


Hencej 5 again has Cauchy^s distribution with the parameter ai 4- ^2 4- * * * 4- Un. 
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Example 3. Let Xi, X 2 j . . . Xn be independent variables with uniform distribu- 
tion of probability in the interval (0, 1). The characteristic function of any one of 
them is 

'•Z 


I e^^^dx — — 

Jo tU 


Hence, the characteristic function of their sum s will be 

*pit) 

The distribution function of 5 is given by 
F{t) - C -h 


- iV 

V iU / 


1 r 

— lim I e ^ 1 • 
27rA=o J- 00 


\ ilv 


/ iv 


dv 


and, since the integral again is uniformly convergent, 


^2rJ_„V * / 


-dv. 


F(t) 

The evaluation of this integral presents certain difficulties. To avoid them we 
notice that the integrand considered as a function of a 
complex variable v is holomorphic everywhere. Hence, 
we can substitute for the rectilinear path of integration 
the path T as shown in Fig. 20. 


Rea/ ax/s 


o 

Fig. 20. 


Now it is easy to show that integrating over the path r we have 


m 


C 

= I 


0 


■ P > 0 

if 

n\ 


ff ^ 0 


The integral 


r / - i Vdz 

Jr\ ilz J iz 


being a linear combination of integrals of the type {{g) with g 0 reduces to 0. 
Similarly, 


or, in explicit form. 


0 


{-lYctm - 1) 


Referring to the above expression of F{t), we find that 


m = c + ^2 - ^)" 
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The constant C = 0 since F{t) and the sum in the right member both vanish for 
i = 0. The final expression of F{t) is, therefore: 


E{t) = 


1 

1-2-3 • • 



+ 


n{n — 1) 
1 



The series in the right member is continued as long as arguirents reniain positive. 
Such is the probability that the sum 

xi -P • -\r Xn 


of n independent variables, uniformly distributed throughout the interval (0, l)^ will 
be less than t. The above expression is due to Laplace, who, however, obtained it in 
quite a different manner. 


Problems for Solution 

1 . Prove directly the inequality 


2 


for absolute moments. 

Hint: The quadratic form in X, ju 



(X|a;[2 -f fjL\x\^yd(p{x) 


is definite or semidefinite. Show that the equality sign cannot hold if (p(x) has at 
least two points of increase a, I3 such that a:/? is neither 0 nor ± 1. 

2. Let xi, X 2 j . . . Xnhen variables. Denoting the absolute moment of the order 
a for Xi by jua \ and by cos the quotient 

^ 

+ . . . + 4 ’*))'+ 2 - 

prove that 

^ 4 

a s' > s > 0. 

Hint: Use Liapounoff’s inequality. 

3 . A variable is distributed over the interval (0, -j- 00 ) with a decreasing density of 
probability. Show that in this case moments M 2 and M 4 satisfy the inequality 

Ml g iM 4 (Gauss) 

and that in general 

1 1 


if y > jU > 0. 

Indication of the Proof. 


[(m + i)Mt.r ^ [(^ + i)Mvr 

Show first that the existence of the integral 


x”f{x)dx 

in cme f(x) is a positive and decreasing function implies the existence of the limit 
lim a*+i/(a) - 0; a — 
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Hence, deduce that 


=1, xi^+^d^(x) = (/» + 1)M^, J^“x’^^dv(x) = (j- + 

where ^{x) = /(O) — f{x) and, finally, apply the inequality 

[X -[X * [X 

4 . Using the composition formula (1), page 269, prove Laplace’s formula on 
page 278 by mathematical induction. 

6. Prove that the distribution function of probability for a variable whose charac- 
teristic function (p{t) is given can be determined by the formula 


/-(«)= c + Hm — r ■ - 

^i=o2xj_o,l +AV iv 


Hint: In carrying out Liapounofi’s idea, take an auxiliary variable with the dis- 
tribution 


Also make use of the integral 


1 r “ e~^'^^dx 

- I = 


Many definite integrals can be evaluated using the relation between characteristic 
and distribution functions, as the following example shows. 

6. Let X be distributed over (— oo , + 00 ) with the density The character- 

istic function being in this case 


we find 


whence 


F{t) = C + 


2xJ_oow(l +v^) ^ 2 J-. 

_ I ~ g-l«l 

TrJ-ool-i-v^ 


an integral due to Laplace. 

7 . A variable is said to have Poisson’s distribution if it can have only integral 
values 0, 1, 2, . . . and the probability of x - k is 


the quantity a is called parameter” of distribution. If n variables have Poisson’s 
distribution with parameters ui, as, . . . a», show that their sum has also Poisson’s 
distribution, the parameter of which is ai + 02 + • • • + an. 
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8. Prove the following result: 


27rJ„oo\ V ) 


sin tv ^ 1 , 1 

— di) = — H 

2 ; 2 2 . 4*6 


. . 2n‘ 


{t + + n - 2)- + 




the series being continued as long as arguments remain positive. 

Hint: Consider the sum of n imiformly distributed variables in the interval 
(— ly 4-1) and express its distribution function in two different ways. 

9. Establish the expression for the mathematical expectation of the absolute 
value of the sum of n uniformly distributed variables in the interval ( — K, +M)* 
lim. 


E\xi 4 372 4 


4- = 


2 -d-a 


(2n 4 2) 


1 


(n - 4 




the series being continued as long as the arguments remain positive. 

Hint: Apply Laplace’s formula on page 278, conveniently modified, to express the 
expectation of :ci 4- 3:2 + * * • -+■ 3;» and that of |3;i 4 3:2 4- * • • 4 3:n|. 

10. Show that under the same conditions as in Prob. 9 

, , , , r / sin A” ^sin t — t cos t , 

+ 4 . + . . . + 4.1 . _j_ j — _ — 

Hint: Prove and use the following formula 


lim 

00 



1 — iwx 


dx — —Trityl. 


11. Let Xi and 0:2 be two identical and normally distributed variables with the 
mean = 0 and the standard deviation cr. If x is defined as the greater of the values 
|37i|, |a72|, that is, 

X — max. (|a;i|, |a; 2 |) 
find the mean value of x as well as that of x^. Ans. 


12. Let 



X = min. (|a;il, \x 2 \y . . . \xn\) 


where x^, ^ . Xn are identical normally distributed variables with the mean = 0 

and the standard deviation <r. Find the mean value of x. Ans, Setting for brevity 


^r- 

cr\/xc/0 


ill 

== e({), 


we have 


E(x) =^"{1 - eit)]”dt. 
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In particular for n = 2 


For large n asymptotically 


EQc) = -^(\/2 - 1). 

“V TT 


Eix) 


cr'x/ 7r/2 
n + 1 


13. A variable with, the mean = 0 and the standard deviation = 1 is called a 
^'reduced variable.” By changing the origin and the unit of measurement any 
variable can be made reduced. For, if x has the mean a and the standard deviation o- 
the variable 


u 


X — a 


<r 


is reduced. The distribution function of the reduced variable u can be called the 
^'reduced law of distribution.” 

As we have seen, variables xi and X 2 with normal distribution have the same 
reduced law of distribution, as does their sum. The question may be raised : Is the 
normal law of distribution a unique law possessing this property? (G. P61ya.) 

Solution. Let xi^ xi be two variables for which the second moment of the distri- 
bution exists, so that we can speak of their means and standard deviations. Let xi 
have its mean ai and its standard deviation cn; likewise, let and 0*2 be the mean and 
the standard deviation of X 2 . Three reduced variables 


— Cll ^2 — 0>2 Xi X2 — — CC2 

Ui == > U2 = » Uz == . 

(T'l 0’2 'Y/ ct^ _j_ 

have by hypothesis the same law of distribution. Hence, they have the same charac- 
teristic function <p(t) whence we can draw the conclusion that the characteristic 
functions of xi, X 2 , xi -f X 2 are, respectively, 

<pi(t) == <p 2 (t) = e^^°' 2 <p{a- 2 t); <pz{t) = 4-O-20* 

Since 

we must have for an arbitrary real t 


or 

( 1 ) 

where 


<p{(Xt)<p{^t) = <p{t) 


+< 


= 


Vo’! + o-| 


= 1 . 


Since (1) holds for every real t, we shall have 
and 

(2) (p{t) - <p(aH)<p(ot^t)^<p(fiH). 

Applying (1) again to each of these factors in the right member of (2), we find that 

( 3 ) cp{t) = 
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and proceeding in the same way, we arrive at the general formula 
(4) <p{t) = 

where poy pu • • • pn are coefficients in the expansion 

(1 + 2 :)” = 2>o 4* + • • * H- P.nZ^- 

The arguments 


V(i = aH, Vi — a" 

tend uniformly to 0 since qj < 1, /S < 1. The quotient 


1-1 f” r 
— = - I iW(«) I (1 - x)ei'^^dx 

J-« Jo 
gent integ 

=4: 


is represented by a uniformly convergent integral; hence 


tHF{t) - -- 


or 

where 

At the same time 


where again 


<p{v) - 1 + [-1 + e(v)]v^ 
e(v) — )• 0 as t; -» 0. 

log (p(v) — [-“i + 5{v)]v^ (principal branch of log) 


5(C —^0 as t; 0. 
Now, taking logarithms of both members of (4) 


log (pit) = —it^ipooi^^ H- pia^^ + * * * + Pn^^^) 4 = —'1^^ 4* 

where 


Q ~ t^[pQ5{vQ)a^^ 4 pi8{vi)ot^^~^B^ Pn8(Vn)^^^]. 


Given e > 0, we can take n so large that 


whence 


< €,* i = 0, 1, . 


\Q\ < dK 


Thus 


llog <p{t) 4- 

and since e can be taken arbitrarily small, 

log (Pit) + ^ 0 


n 


or 

p(t) = e-i‘\ 

wMoh shows that the normal law is the only one with the required properties, among 
all laws with finite second moments. 
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CHAPTER XIV 


FUNDAMENTAL LIMIT THEOREMS 

1. Bernoulli’s theorem, as we have seen in Chap. VII, follows from a 
more general one known as Laplace’s limit theorem. In terms already 
familiar to us, this theorem can be stated as follows: Let an event E 
occur m times in a series of n independent trials with constant probability 
p. As n becomes infinite, the distribution function of the quotient 

m — np 
'\/npq 

approaches 

— r e-^^'^du 

V^J-oo 

as a limit; or, to state it in a less precise form, the distribution of the 
above quotient tends to normal. 

Just as Bernoulli’s theorem itself is a very particular case of the general 
law of large numbers, so Laplace’s limit theorem is a special case of 
another extremely general theorem, the discovery of which by Laplace 
may be considered as the crowning achievement of his persistent efforts, 
extending over a period of more than twenty years, to find the approxi- 
mate distribution of probability for sums consisting of a great many 
independent components with almost arbitrary distributions. The 
result at which Laplace finally arrived is as astonishing as it is simple: 
if xi, 0 : 2 , .. . Xn iE(Xi) = 0, f = 1, 2, . . . n) are independent variables 
(subject to some very mild limitations not stated, however, by Laplace) 
and Bn is the dispersion of their sum, then for large n the distribution of 
the quotient 

Xi + X2 + ^ ^ + Xn 

VK 

is nearly normal. To put it more precisely, the distribution function 
of this quotient tends to the limit 

— C e~^^^du 

as n becomes infinite. 

Laplace’s attempt to prove this important proposition does not stand 
the test of modern rigor and, besides, cannot easily be made rigorous. 

283 
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The same is true of the attempts made by later investigators, notably 
Poisson, Cauchy, and many others. Only after a lapse of many years 
were truly rigorous proofs of Laplace's theorem given. This important 
achievement is the result of the work of three great Russian mathemati- 
cians: Tshebysheff (1887), Markoff (1898), and Liapounoff (1900-1901). 
An account of Tshebysheff's and Markoff's ingenious investigations is 
given in Appendix II. Here we shall follow Liapounoff; for his method 
of proof has the advantage of simplicity even compared with more recent 
proofs, of which that given by J. W. Lindeberg deserves special mention.^ 

2. Before going into details of analysis, we shall state the limit theo- 
rem in a very general form due to Liapounoff. 

Laplace-Liapounoff’s Theorem. Let rri, x^, . ^ . Xn be independent 
variables with their means = 0, possessing absolute moments of the order 
2 -f 5 {where 8 is some number > 0) : 


//<2) 


J/, denoting by Bn the dispersion of the sum xi + 0:2 + 
quotient 


0)n 





tends to 0 as n ^ , the probability of the inequality 


+ :r2 4~ • • * Xn ^ ^ 

VK 


tends uniformly to the limit 



e'^^^du. 


+ Xn, the 


It is natural that the complete proof of a theorem of such character 
cannot be too short, and to make the proof clearer it is advisable to 
divide it into logically separated parts. 

3. The Fundamental Lemma. Let Sn be a variable, depending on an 
integer n, with the mean = 0 and the standard deviation == 1. If its 
characteristic function 

<pn{v) = i7(e"^*") 

tends to 




6 2 


1 Lindeberg^s proof, as well as later proofs by P. Levy and others, make use of an 
ingenious artifice due to Liapounoff. Lindeberg explicitly acknowledges his indebted- 
ness to Liapounoff, while Levy and other French writers fail to give due credit to the 
great Russian mathematician. 
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uniformly in any given finite interval { — I, 1), then the distribution function 
Fnit) of Sn tends uniformly {in the domain of all real values of t) to the limit 

— ^ r e-^'^^du. 

Proof, a. Together with the variable Sn, whose distribution function 
is Fn(i)j Liapounoff considers another variable 

Tn ~ Sn "1” y 

where ?/ is a normally distributed variable with the distribution function 

1 — — 

G{y) = - — ^ I e ^^dx. 

Denoting the distribution function of by Hn{t)j we have (Chap. XIII, 
Sec. 7) 


e— a 

H„{t) =~('° dF„(x) f ^ 

V ttJ - « J — 00 


e-'^^du. 


On account of the inequality 


1 r*" 1 

-^7= e-^^du g T ^ 0 

x/ttJt ^ 


we have: 


For t — X < ^: 


For t — X > 0: 


r~h 0f V 

= j e-^^du = -H-e V A / ; 0 < ^ 1. 

rJ - oo A 

r * e-<^^du =1 ^ f e-'^^du = 1 - . 

J — 00 'y' A 


0 < e" ^ 1. 


Hence, introducing these expressions into (1), 

EniS) = f ^ r e“(^) dF„(x) 


dF n{x) — e C ) dF„{x) 


where again 0 < < 1; 0 < 0i < 1. This leads to the folio-wing 

inequality : 

X eo /i a;\ 3 

e~y^)dFnix). 
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and consequently 


|H.(0 - Pnit)\ < 


e 4 e~^^^ipn{v)dv 


( 2 ) 


e ^ [<pn(v) - e ^dv + 


4-1 e ^ 


Here we split the first integral into three Ji, J 2 , Jz, taken respectively 
between limits — 00 , — Z; --1^ 1; I j 4- and denote the second integral 


by Ji. Since \(Pn{v) — e ^ 2, we shall have 


-r^\J 1 + Js] < 
4v‘?r 


e ^ dv < 


2 e ^ 
’x/tt 


because 


e~'^^du < 


for positive x. Also 




e ^dv 


To estimate J 2 we shall denote by en{l) the maximum of \<Pniv) — e in 
the interval ^ v SI- Then 


^ I r I ^ ^€71 (Z) 

4 ^, 


6 ^ dv (J) • 


Finally, taking into account (2), (3), (4), and (5), we find 

(hlY 

(6) - F„m < ^ 


b. Expression (1) of Hn{t) can be transformed in a manner similar 
to that employed in Chap. XIII, Sec. 7, if we first write 

t—x i~x 

f ^ e-’^^du = J + _L r * e-n^du. 

V’tJ- « ^ VTrJo 
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Thus we get 


^n{t) = - + — e * ■ 

2 27rJ-oo 


or 


Hn(t) 

Now 


= 5 +-r 

Ttjo 


t?2 . 

---sal'* + , 

V 2Tr 


L f" 

2 zrJ-. 


. ??2 


tv 


-{e 2 — (p^{v))dv. 


^Jo 2 ^ a-Jo 


^2^2 1)2 • , 
— 1 — o-sm tv 


4 2 


c?z; 


< 


/72 f « 

^ ve ^dv 
47rJo 4:7r 


since 


A 2»2 


and consequently 


(7) 


k(i) - J - - f 

1 I' irjo 


0 < 1 


00 ?)2 . 
“o-sm tv 

g 2_ 


< 






< 


At ^ 2 xJ- „ 


00 fe22,2| 




M 




To find an upper bound of the integral in the right member, we split 
it into five integrals Ii, 1 2 , Iz, I a, 1 5 taken respectively between limits 
— 00 , — Z; —Z, —X; —X, X; X, Z; Z, + 00 . To estimate J 3 , we notice 
that 


Wn{v) 




xHFnix) = 


< ^ 
= 2 


and 

Hence 

(8) 


\q^n{v) - e 2 ] g 


To estimate J 2 + ^ 4 , we use the inequality \iPn{v) — e ^ €n(Z) and we 

get 

6 


Lf, + 7.1 S 


-rdv ^ 6„(0 


(9) ^ 

Finally, dealing with h and Is, we use the obvious inequality 


\<Pn{.v) 


e 21 g 2 
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and we obtain 


( 10 ) 


2 t' 


+ U 


-f' 


< ifU 

V TT {hlf 


Taking into account (7), (8), (9), and (10), the following inequality 
results: 


H„{t) 


1 _ 1 r 

2 irjo 




yap 

^ hy , , e„(/) , 4e 4 

^ + 27r ^ (hir ' 


In it, since X is still at our disposal, we can take 

X = e„(l)ih-K 

The inequality thus obtained when combined with (6) gives (a = kl) 


(11) 


J TTjo V 


dv\ 


<4lJ+ + + 


+ 




(l)^ + 5««(0. 


Here a and I are arbitrary positive numbers. We dispose of them in 
the following manner: Given an arbitrary positive number e, we take a 
so large as to have 

4e ^ 2 6^1 

TT 'y^ a 3^ 

and after that we select I large enough to make 

^ 1 

V8J ^ ^ 3"' 

Finally, since for a fixed Z, €„(Z) by hyoothesis, tends to 0 when n oo, 
there exists a number no such that 






for all n > no. The inequality (11) then shows that 




sin tv 


dv 


< € 


for n > no and this means that 


liHT F„(f) = i + i f "e-r = 1 f' 

n — >■ 00 Q V -y/ 27r«/ ~ °® 


V 
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uniformly in t because the number no, as clearly follows from the pre- 
ceding analysis, depends upon e only and not upon t. 

Remark 1. Without changing anything in the proof, we can state 
the fundamental lemma in a slightly generalized form as follows: If tn 
tends to the limit t, the probability of the inequality 

Sn tn 

tends to 

- 4 = r 

V2tJ_, 

Remark 2. The fundamental lemma, although not explicitly stated 
by Liapounoff, is implicitly contained in his proof. More general 
propositions of the same nature have been published by P61ya and Levy. 
The very elegant result due to the latter can be stated as follows: If 
the characteristic function of the variable Sn tends to the characteristic function 

4>{t) = f ^ 

of a fixed distribution uniformly in any finite interval^ then 

lim Fn(t) = F(t) 

at any point of continuity of F{t), 

The above proof, corresponding to the particular case 

can be used, almost without any changes, in proving the general proposi- 
tion of Levy. 

4. Proof of Liapounoff’s Theorem, a. If Liapounoff’s condition 

M2+5 + ^2+5 + • ' • + ^ Q 

is satisfied for a certain 5 > 0, it will be satisfied for all smaller 5. 

Let fi{t) be the distribution function of Xi{i = 1, 2, . . . n). The 
sum 

f{t) — flit) +/2(0 -f * * * +/n(^) 

being a nondecreasing function of i, the following inequality holds 
(Chap. XIII, Sec. 5): 
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provided a > b > c > 0. We take here 

(X=:2-{-5, 6 = 2-1- c = 2 

supposing 0 < 5' < 5. Then 

f_\wdm = B, 

1 1 



But this inequality is equivalent to 


and it shows that 


if 


5 ' 



1 


provided 0 < $' < 5. Hence, in the proof we can assume that the funda- 
mental condition is satisfied for some positive 5^1. 

6. Liapounoff^s inequality (Chap. XIII, Sec. 5) with c = 0, 6 = 2, 
a = 2 + 8 when applied to Xi gives 


Hence, 

( 12 ) 


g h = Eixl), 



and, since it is assumed that con 0, all the quotients 
= hi ft = 1 2 

Bn 6x + 62 + • • • + 6n ^ * 

will converge to 0 unifornoly as n oo . 


. . n) 
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c. The following formula can easily be obtained by means of integra- 
tion by parts: 

— 1 ix — ~ ^ __ _ t)dt. 

A Jo 

If X is real and in absolute value >2, we have 


■r 


a;2 - l)(l - t)dt 


^ < 


ja-p+j 

2** 


since 


|eixi _ i| ^ 2. 

If [a:! ^ 2, we can use the inequality 


le«< _ i| g 2 


and find 


Thus, for every real x 


t 5 2!fL< 


1 

- 3~W ^ ~¥' 


^2 lr|2+5 

!-= l + + i^ISl. 


Substituting here 

rr = 

VBn 

and taking the mathematical expectation of both members, we have 
(13) vk{t) = = 1 - \e,\ ^ 1 . 

mXjji 1 _i — 

2^5„ 2 

Furthermore, since 

1 — a: = e“* — |a:2; a; > 0; 0 < ^ < 1, 


we can write 



If co„|ip+^ < 1, we shall have, by virtue of (12), 


< 1 
jDn 
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and consequently 


\2bJ \2bJ \2B. 




1+1 


< 


2^ 2 


.,(k) 


This inequality, together with (13) and (14), leads to the following 
expression of <pk{t ) ; 

hk 

(15) <Ph(f) == ^ (1 + 0“^) 

where 

(16) Ifffci < 


8 i+- 

5: 2 




d. The characteristic function of the variable 
rri + a;2 + * ‘ * 4" 


Sn — 


VbI 


IS 


<p(t) — (pi{t)(p2(t) • ’ • (Pn(t) 

because Xi, , Xn are independent variables. Hence, by (15) 

<p(t) = e'"^^^(l + (yi)(l + 0'2) * * ‘ (1 4” iTn) 

<(H-H)(1 + H) • • • (l + W)-l<eI-^l + N+-*.+l«rni_l 

and 

(17) i^(o - -- 1 

taking into account inequalities (16). Inequality (17) holds if 

< 1. 

Suppose, now, that t is confined to an arbitrary finite interval 
Because con, by hypothesis, tends to 0, the difference 


«3< o « z 2+^ 


will tend to 0 as n — > co . In connection with (17) this shows that 

<p(f) 

uniformly in any finite interval. It suffices now to invoke the funda- 
mental lemma to complete the proof of Liapounoff ^s theorem. 

6. Particular Cases. This theorem is extremely general and it is 
hardly possible to find cases of any practical importance to which it 
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could not be applied. Two particularly significant cases deserve special 
mention. 

First Case. Let us suppose that variables Xi, 0 ^ 2 , .. . Xn are bounded, 
so that any possible value of any one of them is absolutely less than a 
constant C. Evidently 


and hence 


^ C^Eixf) = C^bi 


COn ^ 


It suffices to assume that 

■Sn = &1 + &2 + ‘ * * + 


tends to infinity to be sure that con 0. Hence, dealing with bounded 
independent variables, the condition for the validity of the limit theorem 
is 


jBn — > CO as n 00 , 


which is equivalent to the statement that the series 


61 + &2 + &3 4 " * * * 


is divergent. 

Poisson’s series of trials affords a good illustration of this case. In 
the usual way, we attach to each of the trials a variable which assumes 
two values, 1 and 0, according as an event E occurs or fails in that trial. 
Let Pi and = 1 — pi be the respective probabilities of the occurrence 
and failure of E in the ^th trial. The variable Zi attached to this trial 
is defined by 

Zi = 1 if E occurs, 

Zi ^ 0 if E fails. 


Noticing that 


E(.Zi) = Pi, 


we introduce new variables 


Xi = Zi — Pi (i = 1, 2, . , . n) 
with the mean 0, whose sum is given by 

m — np 

where m is the number of occurrences of E in n trials and p the mean 
probability 

= Pi + + * ' * + 

^ n 
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In oixr case 
and 


= Viqi 



Hence, we can formulate the following theorem: 
Theorem. The probability of the inequality 


m — np < t's/Bn 

tends uniformly to the limit 




__!f! 

e 2du 

00 


as CO , provided the series 


Y/Piii 

1 

is divergent. At the same time the probability of the inequalities 
ti\/Wn < m — np < h's/Rn 
tends uniformly (in /i, tz) to the limit 

1 

— 7 =r I e 2du. 

V^Jh 

Second Case. Let Zij Zzj , . , Zn be identical variables with the 
common mean a and dispersion b. Supposing that for some positive 8 

E\zi — a|2+^ = c 

exists, we have 

nc c _i 

= = . 2j 

(nby^ 

and hence 0 as n — » co . The limit theorem applied to this case 
can be stated as follows: 

The probability of the inequality 

2:1 + ^^2 + * * ’ + Zn — na < t's/^ 




e ^dUj 


tends uniformly to 
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provided 


E\zi — 


exists for some positive h. As a corollary we have: The probability of the 
inequalities 

a < 

\n 

fends to 


. /5 ^ ^ 1 +^ 2 + • * • -{• Zn 
\n n 



t 

e ^du. 


This proposition is regarded as justification of the ordinary procedure 
of taking a mean of several observed measurements of the same quantity, 
made under the same conditions, to approximate its “true value.” 
Barring systematical errors which should be eliminated by a careful 
study of the tools used for measurements, the true value of the unknown 
quantity is regarded as coinciding with the expectation of a set of poten- 
tially possible values each having a certain probability of materializing 
in actual measurement. Since for comparatively small t the above 
integral comes very near to 1 and 



for large n becomes as small as we please, the probability of the mean of a 
very large number of observations deviating very little from the true 
value of the quantity to be measured, will be close to 1 and herein lies 
the justification of the rule of mean mentioned above. 


Estimation of the Eeror Term. 

6. The limit theorem is a proposition of an essentially asymptotic 
character. It states merely that the distribution function Fnit) of the 
variable 

+ X2 + ‘ ‘ + Xn 

VK 

approaches the limit 

1 rt _ h ! 

— I e ^du 

v^j- » 

as n becomes infinite when a certain condition is fulfilled. For practical 
purposes it is very important to estimate the error committed by replac- 
ing Fnit) by its limit when n is a finite but very large number. In his 
original paper Liapounoff had this important problem in his mind and 
for that reason entered into more detailed elaboration of various parts 
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of his proof than was strictly necessary to establish an asymptotic 
theorem. 

We do not intend to reproduce here this part of Liapounoff^s investiga- 
tion; it suffices to indicate the final result. Assuming the existence of 
absolute moments of the third order E\xi\^; ^ = 1, 2, . . . n, we shall 
suppose n so large that 


= 


Then, setting 


we shall have 


__ + ^ 1 

^ 20 ' 

e~^^^du + R, 


Fr.it) 


|i?| < ^COn 


JsJ-. 

('‘’® k)’ + i-i] + “5 ss + 


Although this limit for the error term is probably too high, it seems 
to be the best available. However, it is greatly desirable to have a more 
genuine estimation of R, 

7. Hypothesis of Elementary Errors. It is considered as an experi- 
mental fact that accidental errors of observations (or measurements) 
follow closely the law of normal distribution. In the sphere of biology, 
similar phenomena have been observed as to the size of the bodies and 
various organs of living organisms. What can be suggested as an 
explanation of these observed facts? In regard to errors of observations, 
Laplace proposed a hypothesis which may sound plausible. He considers 
the total error as a sum of numerous very small elementary errors due 
to independent causes. 

It can hardly be doubted that various independent or nearly inde- 
pendent causes contribute to the total error. In astronomical observa- 
tions, for instance, slight changes in the temperature, irregular currents 
of air, vibrations of buildings, and even the state of the organs of percep- 
tion of an observer may be considered as but a small part of such causes. 
One can easily understand that the growth of the organs of living organ- 
isms is also dependent on many factors of accidental character which 
independently tend to increase or decrease the size of the organs. If, 
on the ground of such evidence, we accept Laplace^s hypothesis, we can 
try the explanation of the normal law of distribution on the basis of the 
general theorems established above. 

Suppose that elementary errors do not exceed in absolute value a 
certain number Z, very small compared with the standard deviation a 
of their sum. The quantity denoted by Wn in the preceding section will 
be less than the ratio l/c and hence will be a small number; and the same 
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will be true of the error term R, Hence, the distribution of the total 
error will be nearly normal. 

Laplace^s explanation of the observed prevalence of normal distribu- 
tions may be accepted as plausible, at least. But the question may be 
raised whether elementary errors are small enough and numerous enough 
to make the difference between the true distribution function of the total 
error and that of a normal distribution small. Besides, Laplace’s 
hypothesis is based on the principle of superposition of small effects and 
thus introduces another assumption of an arbitrary character. 

Finally, the experimental data quoted in support of the normal dis- 
tribution of errors of observations and biological measurements are not 
numerous enough for one to place full confidence in them. Hence, the 
widely accepted statistical theories based on the normal law of distribu- 
tion cannot be fully relied on and may be considered merely as substitutes 
for more accurate knowledge which we do not yet possess in dealing with 
problems of vital importance in the sphere of human activities. 


Limit Theorems for Dependent Variables 

8 . The fundamental limit theorem can be extended to sums of depend- 
ent variables as, under special assumptions, was shown first by Markoff 
and later by S. Bernstein, whose work may be considered an outstanding 
recent contribution to the theory of probability. However, the condi- 
tions for the validity of the theorems established by Bernstein are rather 
complicated, and the whole subject seems to lack ultimate simplicity. 
For that reason we confine ourselves here to a few special cases. 

Example 1. Let us consider a simple chain in which probabilities for an event E 
to occur in any trial are p' and p", respectively, according as E occurred or failed in 
the preceding trial. The probability for E to occur at the nth trial when the results of 
other trials are unknown is 


Prt = p -f (pi ” 

where pi is the initial probability, 5 = p' — p" and 



‘Ihe mean probability for n trials is given by 


Pn = p + 


Pi ~ p 1 — 5^ 

n 1—5 


so that p may be considered as the mean probability in infinitely many trials. 

In the usual way, to trials 1, 2, 3, . . . we attach variables xi, X 2 , xs, . . . so that 
in general 


= 1 — Pi 


or 


Xi = —pi 
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according as E occurs or fails in the ith trial. If m is the number of occurrences of 
E mn trials, the sum 

iTi + X2 -h • ♦ • -p Xn 
of dependent variables represents 


Evidently 


m — nfn. 
E{m — np-n) = 0 


and, as we have seen in Chap. XI, Sec. 7, 


Bn = E{m — nprl)^ 


1+5 


that is, the ratio of Bn. n'pq 


1 +5 

1-5 


tends to 1 as becomes Infinite. 


In order to find an appropriate expression of the characteristic function of the 
quotient 


m — npn 

\/¥n 


we shall endeavor first to find the generating function 03n{t) for probabilities 

B — 0, 1, 2, , , . Kt) 


to have exactly m occurrences of in n trials. Let Am,n be the probability of m 
occurrences when the whole series ends with E and similarly Bm.n the probability of 
m occurrences when this series ends with F, the event opposite to E. The following 
relations follow immediately from the definition of a chain 


(18) 

Let 


Afn,n+1 — Ani-~l,nP^ + Bm.—l,nP" 
Bm,nJrl = Am,nq' + Bm,nq". 

00 00 
9n(<) = AmJ’”', 4>n(t) = ^ 

w=0 m = 0 


be the generating function of Am,n and Bm,n. From relations (18) it follows that 


dn+lit) = p'tBnit) + P^'trpn{t) 

- q'Bnit) + $'Vn(0. 


These relations established forn ^ 1 will hold even for w = 0 if we define 0o(O and 
’Ao(0 by 

V'Bq + p'Vo == Pi 
q'do + g'Vo == 1 — pi 

whence 


5o + ^0 == 1. 

From (19) one can easily conclude that both Bn{t) and satisfy the same equa- 
tion in finite differences of the second order 


5n+2 - (p'< + g")5«+i + Udn = 0 

““ ip't + == 0 . 



Sec. 8] 


FUNDAMENTAL LIMIT THEOREMS 


299 


Evidently 

hence 


Am,n "t” Hjj] 


can(t) = 0 n(t) + ^n(t) 

satisfies the equation 

(20) Oyn +2 ““ (p't + 4" StoJn == 0 

and is completely determined by it and the initial conditions 


Since 


6)0 = 1 , m = Qi -h pii^ 


p' == p + q5, g" = + p5 

the characteristic equation corresponding to (20) can be written 


(r - l)(r - 5 ) - (t - l)[(p + g 5 )f - 5 ] 

and for small ^ — 1 its roots can be expanded into power series 

= 1 H " Ci (^ — • 1 ) + 02(1 — 1 )^ 

^2 = 5+ di(t — 1) + d 2 {t 1)2 ^ ^ 

The general expression of c^nit) will be 

- Atl + - Ar? + 


where to satisfy the initial conditions we must take 


. .^2 — 0.1 — Pit _ —^1 “h ^1 -f- Pit 

A = ; jd 

u -ti ^2 - ri 

Having found 02n{t), the characteristic function of 


m — npn 

Sn “ / 

V Bn 

will be given by 

vi / . - 

— npn~—z. ( t—p=. 

<pn(p) == e ^'®”6)n\e 

To study the asymptotic behavior of (pn(v) when v is confined to a finite fixed 
interval ^ v ^ I, we notice that then 

V 

will be well within the convergence region of the series we are going to consider now. 
By means of Lagrange’s series or otherwise, we find the following expansion of log ^1 in 
power series of i — 1 

convergent for sufficiently small values of ^ ~ 1. By setting i we obtain another 
power series in u 

, pq 1 A- S 

log ri = ptM - — + . . . 
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convergent for suiBciently small u. Hence 

. 1 + 5 w®, ,/\ 

npm — npft — r -r + nu^g{u} 
f = e 1-5 2 

— +npS7^ — nu^giu) 

= e ^ 

where g{u) is a bounded function of % u being contained in a certain interval (—r, r). 
By substituting 


here, we easily conclude that 


tends uniformly to the limit 


in the interval --I ^ v ^ I while 


remains there uniformly bounded. Since, as can easily be seen, A and B can be 
represented by power series 

A = 1 +“ diU 4“ (12^^ -j- . . . 

B = ^aiu — a2U^ — . . . 

A tends uniformly to 1 and B tends uniformly to 0. Hence, finally, cpn(v) in any fixed 

interval tends uniformly to e ^ . It suffices to apply the fundamental 

lemma to conclude that the probability of the inequality 


"\/~Bn 


-npn-~= 


“2 


— npn — 


7Yl> — Th'Pfi tn'\^ Bn 

tends uniformly to the limit 


U 2 


1 n 

— 7 =: I e ^du 

oo 


if tn tends to t. 


j _j_ ^ 

Since Bn is asymptotic to npq-- and fn differs from p by a quantity of the order 

1 — o 


1/n, the inequality 


can be written in the form 


m — np <t 


/l H- 5 


PX XXpn Bn 

with tn tending to t, whence, using the above established result, the following theorem 
due to Markoff can be derived: 
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Theorem. For a simple chain the prohability of the inequalities 

/i + b ^ j /i + 5 

— np < Ujj-^npq 

tends to the limit 

1 -H! 

— 7= I e ^du 

V 27r Jfx 

as n 00 . 

Example 2. Considering an indefinite series of Bernoullian trials with the prob- 
ability p for an event A to occur, we can regard pairs of consecutive trials 1 and 2, 
2 and 3, 3 and 4, and so on, as forming a new series of trials which may produce an 
event E consisting of two successive occurrences oi A{E = A A) or an event F opposite 
io E {F — AB^ BA, BB). With respect to E the trials of the new series are no longer 
independent. Let m be the number of occurrences of E in n trials. Then 

j^(m ~ np2) = 0 

and 

Bn — E{m — np2)2 = np^q{l -f- 3p) — 2'p^q 
as was shown in Chap. XI, Sec. 6. 

Let Pm.n be the probability of exactly m occurrences of P in a series of n trials. 
Evidently 

Pm,n — Am,n “f" Bm.,n 

where Am,n and Bm,n are the probabilities of m occurrences of E when the Bernoullian 
series of n -f 1 trials ends with A or B, respectively. By an easy application of the 
theorems of total and compound probabilities we get 

Am,n+l = Am-l,np + Pm.rtP 
Bni,n-i-l ~ Am,7iq Bm,nq’ 

Corresponding to these relations the generating functions 
ejt) = ^ Mt) = 

m=0 m=0 

satisfy the following equations in finite differences: 

= ptOn + P^n 

^n+1 = qdn + 

holding even for n = 0 if we set do = p, = q~ Hence, it follows that Bn{t) and 
satisfy the same equations of the second order 

- {pt + q)Qn+l 4- pq{t — l)6n == 0 
~ (pt 4“ 4- pq(t - l)i^n = 0 


«„(«) = dn{t) + Mt) = ^ Pm.ni®. 

w = 0 


and so does their sum 
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Thus, to determine con(0 we have the equation 

COn+2 ipt + ” 0 

and the initial conditions 


00 — 1, wi = 1 — 

The general expression of ojnit) is 

<^n{t) = An + Bn = An + Bp^f'it - 
where f i and ^2 are roots of the equation 


- r = pC - i)(r - g) 


and 


-r2 + 1 + pKt - 1 ) 


B = 


n ~ 1 - pKt - 1) 


Ti fa " fi — fa 

If f 1 is the root which for i = 1 reduces to 1, we easily find the following series 

p2( _i_ 2 ' pq ) 

log fl - pKt - 1) ^ 1)2 4- . . . 


or, setting t = and supposing 




p^q(l + 3p) 

log fl - ipH - — + • • • . 


As to A and B, they can be developed into series of the form 

A = 1 + 4. . . . 

B = —cu^ 4. . . . 


Hence, reasoning in the same manner as in Example 1, we can conclude that the 
characteristic function 

npH . 

<Pn{v) - e V^"o)n{eVl^) 

of the variable 

m — np^ 

Va 

tends to the limit e ^ uniformly in any finite and fixed interval —I S v Eef er- 

ring, finally, to the fundamental lemma, we reach the following conclusion: The 
probability of the inequalities 


ti\/np^q(l + 3?)) < m - np^ < t2\/np^q(l 4 3p) 
tends uniformly (with respect to ti and 12) to the limit 




e ^du 


as 
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Problems for Solution 

1. Consider a series of independent variables xi^ x%^ iCs, . . . where in general 
a;* (fc = 1, 2, 3, . . . ) can have only two values and — each with the probability 
J4- Show that the limit theorem holds for the variables thus defined if o; > — 
but the law of large numbers holds only if a < 

Solution. Evidently 

E{xk) = 0, E{xl) = E\x?\^ = 

From Euler’s formula (Appendix I) we derive two asymptotic expressions 


Hence 


^2a:+l 

Bj), = -j- 2i^ct _j_ , , , — - • " 

2q: + 1 

^3o:4-l 

l^a 4-230: -j, .. . 4.^3o:^_l!: 

3o: + 1 


(2a -f" 1)^ 

3a + 1 


OJn — > 0 


so that the limit theorem holds. For a = the probability of the inequalities 


~e < 


Xi A- X2 A- Xn 

n 


< 6 


tends to the limit 



and the law of large numbers does not hold. 

2. Let m.i be the number of successes in i Bemoullian trials with the probability p. 
Show that the limit theorem holds for variables 


Si 


mi — sp 

Vw ’ 


f = 1, 2, 


n 


but the law of large numbers does not hold (Bernstein). 
Hint; 


ai + 6*2 + * ' • + Sn = ( pq ) ^ 


( 


1 + 


+ 


V5+ ■ 




, . -.j 4~ • * 

V nj 

d” "" 

V n 


where ail, a; 2 , . . . Xn are independent variables with two values q and —p associated 
in the customary way with trials 1, 2^ ... n. 

3. Consider an infinite sequence of independent variables Xi, Xz, xzj . . . where 
Xk can have three values 


0, (log ky, “(log ky 
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■with the corresponding probabilities 

^ ^ 2 ^ 1 ^ 1 

{h + ol) (log {h -f- {h -f a) (log {k + cl)\p (k + a) (log (k + a)}P 

€L being a sufficiently large constant. Moreover, ^ and p satisfy the inequality 

2p — p + 1 >0. 

Show (a) that Liapounoff’s condition is satisfied when p < 1 and hence the limit 
theorem holds ; (5) that this condition is not satisfied if p ^ 1 and at the same time the 
limit theorem fails at least for p > 1. 

Solution, a. By using Euler’s formula we find 


03n 


1 + - 

(2m + 1 - p) ^ 
(2 + S)m + 1 - P 


[log (ra + a)}2^ 


Hence the first part is answered. 

6. The probability of the inequality 


is less than 


iCi + ^2 4“ • • • 4" ^ 0 


^2 


1 

{k 4“ ol) (log {k 4- ol)]^ 


and this, in case p > 1, is less than 


-(log a)^-p. 

p — 1 

Hence, the probability of the equality 

Xi X2 * • • 4" = 0 


remains always >1 (log q:)^ ^ and the limit theorem cannot hold. Note 

p _ 1 

that J5n 00 because 2p — p 4“ 1 > 0. 

4. Prove the asymptotic formula 


1 + 


+ 


1 ‘2 



n being a large integer. 

Hint: Apply Liapounoff’s theorem to n variables distributed according to Poisson^s 
law with parameter 1. 

6. By resorting to the fundamental lemma, prove the following theorem due to 
Markoff : If for a variable Sn with the mean = 0 and the standard deviation = 1 

lim X( 4 ) = — ^ f 
n-*« V 2t J _ „ 



FUNDAMENTAL LIMIT THEOREMS 


305 


for any given fc = 3, 4, 5, 
to the limit 


, then the probability of the inequality Sn < t tends 





6. In many special cases the limit of the error term can be considerably lower than 
that given in Sec. 6. For instance, if variables Xi, X 2 , . . . Xn are identical and uni- 
formly distributed in the interval -34, M the probability Fnit) of the inequality 


differs from 


tCi -h iCa + 


Xn t. 


4 


n 

12 



e ^ du 


by less (in absolute value) than 


1 if 2Y 2^ 

7.5?2 7r\jr/ TT^n 


24 


the last two terms being completely negligible for somewhat large n. 
Indication of the Proof. First establish the inequalities 

<p ^ <p 


for 0 ^ ^ ^ t/2. Further, represent F„(0 by the integral 



and split it into two integrals taken between 0 and ttV^/V^ and Tr^/nj's/^ and 
+ 00 . 

7. Supposing again that xi, x^, . . . Xn are identical and uniformly distributed in 
the interval —M, Hi pro/e that for ^ 2 


E\xi + X2 A- • • • A- Xn\ 


\ ^ 60 Vw’ 


0 < 0 < 1 . 


8. Let Sn be a variable with the mean = 0 and standard deviation =1. If its 
characteristic function <pn{t) tends to as w — » w uniformly in any finite interval 
— Z ^ f ^ I, show that 
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9. If independent variables rci, 0 ^ 2 , with means =0 satisfy Liapounoff’s 

condition, prove that 


E\xi -h X 2 A- ’ * • Xn\ 



10. Show that for a simple chain of trials 


, j2npqlA-S 

p being the mean probability in infinite series of trials and 5 = p' — p"* 

11. A series of dependent trials can be illustrated by the following urn scheme: 
Two urns, 1 and 2, contain white and black balls in such proportions that the prob- 
ability of drawing a white ball from 1 is p, wdiereas the probability of drawing a 
white ball from 2 is g' = 1 — p. Whenever a ball taken from an urn is white, the 
next ball is taken from the same urn, but if it is black, the next ball is drawn from the 
other urn. The urn at the first drawing is selected by lot, the probabilities of select- 
ing the first or the second urn being given. Evidently the course of trials is deter- 
mined by these rules without any ambiguity. Let m denote the number of white balls 
obtained in n drawings and let 


a — p^ A- 


Show that the probability of the inequality 


m -na< fs/LccQ. - a)n; L = 

1 - 2pg 


approaches the limit 


Indication of the Proof. 


V^J-I 


^ du. 


p(l) . p(2) . p(3) . p'4) 
m,n; m,nf m,n; m ,n 

be the probabilities of having m white balls in n trials when (a) the last ball is white 
and from urn 1 ; (5) the last ball is white and from urn 2; (c) the last ball is black and 
from urn 1 ; and (d) the last ball is black and from urn 2. The sum 


= piM 

represents the probability of having exactly m white balls in n trials. The generating 
functions of probabilities satisfy the following equations 

piii = 

= pU‘^ + 

whence it can be shown that they all, as well as their sum — the generating function of 
Pm.f ); — satisfy the same equation of the second order 


Zn+2 — tZn+1 A pqiP — l)Zn = 0 . 
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Setting t = one of the characteristic roots will be given by 

4/2 

(1 ~2pg)tw— 4- . • - 

e ^ 

for small u, while the other root tends to 0 as w — > 0. The final conclusion can now 
be reached in the same way as in Examples 1 and 2, pages 297 and 301. 
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CHAPTER XV 


NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT 
THEOREM FOR SUMS OF INDEPENDENT VECTORS. 
ORIGIN OF NORMAL CORRELATION 


1. The concept of normal distribution can easily be extended to two 
and more variables. Since the extension to more than two variables 
does not involve new ideas, we shall confine ourselves to the case of 
two-dimensional normal distribution. 

Two variables, y, are said to be normally distributed if for them 
the density of probability has the form 

where 

(p = ax^ + 2bxy + cy^ + 2dx + ^ey + f 

is a quadratic function of x^ y becoming positive and infinitely large 
together with \x\ + \y\. This requirement is fulfilled if, and only if, 

ax^ + 2hxy + cy^ 

is a positive quadratic form. The necessary and sufficient conditions 
for this are: 


a > 0; ac — 62 = A > 0. 

Since A > 0 (even a milder requirement A 0 suffices), constants yo 
can be found so that 


cp = a{x - Xoy + 2b(x - Xo)(y - yo) + c(y - yo)^ + g 

identically in a;, y. It follows that the density of probability 6“^ may be 
presented thus: 

Q—<p — J[^0^a(x~~xo)^2b(x—xo)(.y—yo)—c(y~yo)^ 


The expression in the right member depends on six parameters K; 
a,b,c; xo, yo- But the requirement 


/ OO ^00 

- « I J'~'^dxdy = 1 


reduces the number of independent parameters to five. We can take 
a, 6, c; Xq, yo for independent parameters and determine K by the condition 



g~a(a;— zo) ^26(a?— h!o) 2/o) ^y,xdy 
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which, by introducing new variables 

^ = o; - xo, V == y - yo 

can be exhibited thus 


kC f'* = 1. 

J — 00 J — 00 


To evaluate this and similar double integrals we observe that the positive 
quadratic form 

ae + 2 b^r} + C7?2 

can be presented in infinitely many ways as a sum of two squares 
ap + 26^17 + + (tJ + 

whence 

a = + 7^7 c = + 5^; b = yd 

and 

(ad — 

By changing the signs of a and ^ if necessary, we can always suppose 

ad — jSy = +\/A' 

Now we take 

U = V = 7^ dif] 

for new variables of integration. Since the Jacobian of w, v with respect 
to Tj is -x/Aj the Jacobian of rj with respect to Uy v will be I/^/a and; 
by the known rules 


J-X 


Thus 


1 ^00 ^00 

^_a$2-26^77~C772^^^^ _ J I 

\/ A coj — 00 


^ = 1, K = y^. 
Va ^ 


'^dudv 


Va’ 


That is, the general expression for the density of probability in two- 
dimensional normal distribution is 


\/ac — b^ 




g—aCa;— ao) 2 — 26 (x— ®o) ith~yo)—c{y~yo) 


2. Parameters Xo, yo represent the mean values of variables x, y. 
To prove this, let us consider 


- rKo) = — f " f “ {x - 

TT J — 00 J — 00 


Xo)e“ 


■a(ix—XQ)^2bix'-~xo) iy—yQ)—ciit~yQ) 2 , 


''dxdy. 
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To evaluate the double integral, we can express x and y through new 
variables w, v introduced in the preceding section. We have 

8u — Bv —yu + av 

and 


E(x — xq) 


whence 
and similarly 


~ V^ J^ f ^ = 0, 

E{x) = Xq, 

E(y) = 2 / 0 . 


3. Having found the meaning of Xo, yo we may consider instead of x, y, 
variables x — Xo, y -- yo whose mean values = 0. Denoting these new 
variables by x, y again the expression of the density of probability for 
X, y will be: 

y/aC — 

TT 


It contains only three parameters, a, b, c. To find the intrinsic meaning 
of a, b, c let us consider the mathematical expectation of {x + Xy)^ 
where X is an arbitrary constant. We have 


E{x + Xy)2 = r r (a; + 

J ~ — oo 

or, introducing u, v defined as in Sec. 1 as new variables of integration, 

E{x + Xy)^ = ^ [(^ “ Xy) V + 2(8 — X7)(-~/3 + Xo')^^^; + 

+ (B —'Ka)V]e~^^^^'^dudv == 

1 ^00 ^00 

= ^ I I [(5 - Xt)2 + (^ - \ay]u^e-^^^'^dudv = 


But 

whence 


~ ~ + ^-2^' 

52 ^2 — -y2 ^ ^2 _ 4- 


E{x^) + 2\E{xy) + = ^ - 2 xA + 


and since X is arbitrary 
c 


E(x^) = 


2A 


B(.m) = -L, E{y^) = 
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On the other hand, if o-i, 0 - 2 , and r are respectively standard deviations 
of X, y and their correlation coefficient, we have 


Hence 


and 


or 


E{x^) = erf, E(xy) = r<ri(r 2 , E(y^) = o-f. 


2A 


a 2 6 

— = 0 - 2 J ^ = -'rcri(r 2 


2A 


= Or|<r|(l - r2) 


Finally, 
a = 


2(rf(l - r^y 


2A 


6 = 


2 cr|<r|(l - r^) 


Va = 


2or 10 - 2(1 — r^) 


2o-ia'2-\/l — 


c = 


2al(l - r^) 


With these values for a, b, c, and a/a the density of probability can 
be presented as follows: 

1 £+©■] 

2^0- i<r 2 vl — 

and the probability for a point re, y to belong to a given domain jD will be 
expressed by the double integral 


2x0- 10-2 Vl “ 5^^ J* J* 

iD) 


2(1 


— r 2 )L \<^i/ O’! 0^2 


r-vi 

-idxdy 


extended over D. 
4. Curves 


1 

2(1 - r2) 



2r^t + 

Cl cr2 



Z == const. 


are evidently similar and similarly placed ellipses with the common 
center at the origin. For obvious reasons they are called ellipses of 
equal probability. The area of an ellipse corresponding to a given value 
of I (ellipse 1) is 

JlH— = 2nrlai(r2^i “* tK 

'S/A 
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whence the area of an infinitesimal ring between ellipses Z and Z + cZZ 
has the expression 

27r(ri(r2\/l — T^dL 

The infinitesimal probability for a point x, y to lie in that ring is 
expressed by 

6-yz. 

Finally, by integrating this expression between limits h and U > Zi, we 


as the expression of the probability for y to belong to the ring between 
two ellipses h and Z 2 . If Zi = 0 and h = Z, 

1 -- 


gives the probability for x, y to belong to the ellipse Z. 

If n numbers Z, li, Z 2 , . . . Zn-i are determined by the conditions 

\ ^ Q~l = — Q-h = Q-h ^ 0-h — . , . — Q-ln-2 _ Q-ln-l = — 

n + 1 

the whole plane is divided into n + 1 regions of equal probability: 
namely, the interior of the ellipse Z, rings between Z,Zi;Zi,Z 2 ; . . . ln-% Zn-i 
and, finally, part of the plane outside of the ellipse Z„_i, 

6. To find the distribution function of the variable x (without any 
regard to y), we must take for D the domain 


— <x) < x < t; — 00 < y < + 00, 

As the integral 

27r<7-ia-2'\/l — r^J- 00 J~ « 

1 /*t El r cc 1 El 

j===\ e I e ^0--r^)dz —I e ^‘^^^dx, 

Ti'vl T ^—00 cri'\/ 27 r^— 00 


2x0-1 


we see that the probability of the inequality 

X < i 

is expressed by 

_j_f 

o-iv^ J - 

Similarly, the probability of the inequality 





y < t 
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is 

1 rt -.jL 

Thus, if two variables x, y are normally distributed with their 
means = 0, each one of them taken separately has a normal distribution 
of probability with the common mean 0 and the respective standard 
deviations <ri and 0 - 2 . Variables x and y are not independent except when 
r = 0. For if they were independent the probability of the point 
X, y belonging to an infinitesimal rectangle 

t < X < t dt] T < y < T + dr 

would be 

1 ^ 

Q 2cri2 

Z'Kdidi 

whereas it is 

L p, 2(l-r2)[ Oi) cr 2 '^ 02 ) '\dtdT, 

27rc7io-2'\/l — 


and these expressions are different unless r = 0. Thus, except for r = 0, 
normally distributed variables are necessarily dependent in the sense 
of the theory of probability. Dependent variables are often called 
^^correlated variables.” In particular, variables are said to be in ^^normal 
correlation” when they are normally distributed. 

6. The probability of simultaneous inequalities 


X < X < X', y <t 
is represented by the repeated integral 


27r(ri(72'\/l — T^jx 


X ' 


e ^^^''dx 




l: 


2 ( 7 - 22(1 —r 2 ) 




dy 


while 


0*1 




X' 

e ^^^""dx 


is the probability that x will be contained between X and Z'. Hence 
(Chap. XII, Sec. 10) the ratio 


<r2A/27r(l — r^) 


^ ft 1 r _<n Y 

fX ' __£f_ 

Jx ^ ^"^^^dx 


can be considered as the probability of the inequality 

y < t 
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it being known that x is contained between X and X'. Considering X' as 
variable and converging to X the above ratio evidently tends to the 
limit 


1 f 






dy 


which can be considered as the distribution function of y when x has a 
fixed value X. Hence, ^ for a; = X has a normal distribution with the 
standard deviation 


and the mean 


Y = 

0*1 

Interpreted geometrically, this equation represents the so-called 
“line of regression’’ of y on x. 

In a similar way, we conclude that for ^ = F the distribution of x 
is normal with the standard deviation 


<Ti\/l — 

and the mean 


X = r^F. 

0-2 

This equation represents tho line of regression of x on y. 

LIMIT THEOREM FOR SUMS OF INDEPENDENT VECTORS 

7. So far normal distribution in two dimensions has been considered 
abstractly without indication of its natural origin. One-dimensional 
normal distribution may be considered as a limiting case of probability 
distributions of sums of independent variables. In the same manner 
two-dimensional normal distribution or normal correlation appears as a 
limit of probability distributions of sums of independent vectors. 

Two series of stochastic variables 

Xl, 0^2, .. . Xn 
Vh ^2, . . . 2/n 

define n stochastic vectors Vi, ^ 2 ^ . , . so that Xi, yi represent com- 
ponents of Vj on two fixed coordinate axes. If 

E(xi)=^aiv E(yi)^hu 
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the vector with the components ai, bi is called the mean value of Vi. 
Evidently the mean value of 

V = Vi + V2 + • * • + Vn 
is represented by the vector 

a = ai + 3-2 + * • • + an 

and that of v — a is a vanishing vector. Without loss of generality 
we may assume at the outset that 

E(x^) = E(yi) = 0; i = 1, 2, . . . n, 

in which case E(y) = 0. Vectors Vi, V 2 , • . . are said to be inde- 
pendent if variables Xij yi are independent of the rest of the variables 
Xj, yi where y 9 ^ i. 

In what follows we shall deal exclusively with independent vectors. 

8. As before, let x^, yk be components of the vector 

Vk{k = 1, 2, . . . n). 

Then 

X = + 0^2 + * * * + 

V = 2/1 + 2/2 + • • • + 2/n 

will be the components of the sum 

V = Vi + V 2 + * • • 4" Vn. 

If 

E{xk) = E{yk) = 0 

E{xt) = bk, E{yl) = Ck, E{xkyk) = dk 

then 

E(X) = 0, E(Y) = 0 
E(X^) = 61 + 62 + • • • + 6. = £n 

E(Y^) = Cl + C 2 4* • * • + Cn = Cn 

E{XY) = di 4" ^2 4" * * • 4“ dn = 

because 

= 0 if 2 9^ h 

variables Xi and yj being independent. 

Let us introduce instead of variables Xk, yk{h = 1, 2, . . . n) new 
variables 
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and correspondingly 

X Y 

S = — -==j cr = — 

\/^ 

instead of X, F. We shall have: 

mk) = Eirjid = 0 

mi) = 1 ^: mi) = # 

JLfn 

and 

E{s) = E((t) = 0 
E(s^) - E{(t^) = 1 
E{s<r) == rn- 

The quantity fn, the correlation coefficient of s and cr, is in absolute value 
^1. We define 

v) = 

as the characteristic function of the vector s, cr. Evidently 0) and 
<^)(0, y) are respectively the characteristic functions of $ and (r. Since 

0i{us-\-va) giCwfi+t’iyi) . gi(u|24“Vi72) . . . ^i{u^w¥vrin) 


and the factors in the right-hand menaber represent independent varia- 
bles, we shall have 

9, For what follows it is very important to investigate the behavior 
of v) when n increases indefinitely while w, v do not exceed an 
arbitrary but fixed number I in absolute value. 

Let 

= fk, E\yk\'^ = Qh 

and 

/i +/2 + • • • +/w _ 

+ P2 + • * * + fiTn _ 

If a?n and rjn tend to 0 as n oo, we shall have 

( 1 ) v) — g--KwH* 2 rnWi;+v 2 ) I Q^moin+tjn) — J 

provided 
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and n is so large as to make 


Z(a)| + 77 I) < 1 . 


Since 


= 1 + i{uik + vrik) ~ ^{u^h + mkY + 




we shall have : 


9 ' 


+ ^E\u^, + wkl^; \e'\ < 1. 


On the other hand, 


1 - - 


2dh 


uv 


^2 _ Q 2nn“' 2^yBnCrr^ 


bk . 2 dk Ck 

—rr-=rU^ - UV ---^2?^ 


25n 2 Vb:c: 2 c; 


+ \[E{uh + |0"l < 1 


and so 


bk „ 2dk Ck , 






Furthermore, 


E{u^k + ^ + 2 w|r 7 | + »?|) <1 


because 


^(11) = ^ < ‘oi EivD = # < ’jI, 

JDn 


Also 


lE{u^k + vvk)^? < [E(u^k + vvk)^]^ ^ E\u^k + yr//cl* 

^ 'f” 

Taking into account these various inequalities, we may write 

^(e«=^£,+r,*)) = e 2 V^ (1 + ffk) 
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where 

Finally, 


^{XL, V) = ^ (1 + ^ 2 ) « * * (1 + (Tn) 

and 


\4>{Uy v) — e~Kw^2r„wr4-ij2) I <;; g|ffi|+k 2 |+ • • • +M — ^ < — 1 


as was stated. 

10. Theorem. Let P denote the probability of simultaneous inequalities 


h S s < ti] To ^ cr < n. 


Provided r^ remains less than a fixed number a < 1 in absolute value and 
the above introduced quantities con, Vn tend to 0 as n--^ co , P can be expressed 
as 


P = 


2x^1 


1 p* p 

1 ’ tJro 


g 2(1 -r«2) ^ 


where An iends to 0 uniformly in U, h; ro, ri. 

//, in addition^ rn itself tends to the limit r(|r| < 1)P will tend uniformly 
to 


2Wl 


1 p 

1 -^^Jtojro 


6 2(1 -7*2) 


Proof, a. In tr 5 dng to extend Liapounoff’s proof to the present case 
we introduce an auxiliary quantity 11 defined as 

/ 1 /u-sY rri /v-<rY \ 

n = \ ^^du- \ e y^^dvl 

Using the inequality 


L f" 

\/^Jx 


e~^''dt < 


for 


:r > 0, 


one can easily derive the following inequalities: 


( 2 ) 


1 ^ — A 2 nri / t? — <r \ 2 

I- e V W dw ■ e V i _ i< 

•’TJJo Jro 

l / / <!-- / fo - a \ 2 — /' to— A A 

^ a V^/-(-g \A / -j- g V A / y 
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if 


(3) to ^ s < tlj To ^ (T < Tl, 

and 



( Tl — <r \ 2 ^ ✓ TO — cr \ 2\ 

h J ^ e \ h J ) 


if at least one of the inequalities (3) is not fulfilled. From the definition 
of n, P and from (2) and (4) it follows that 


[p - n| < ^E\e V w +e y J 




But referring to (1) and setting 


giZHo^n+vn) — 1 = a„(Z) 

we have by virtue of the developments in Chap. XIV, Sec. 3, 


(5) 


|p - n| < 2 «„(f) + hV2 + 


-(-Y 

8 e 

■x/tt 


6. Replacing ti by variable quantities t, r and taking the second 
derivative of n with respect to t and r, we get 


dm 

dtdr 



On the other hand 





g-i«U+T1>)gi(u»+OT)£^y^j,^ 


whence 

S ■ iX .X vVvdv. 

Here we substitute 

<i>{u, v) = e-i(«»+ 2 r,ur+r!! 

For all real u, v 

\g{u, t;)| g 2. 

If ^ I, bl ^ I, where I is an arbitrarily fixed number, and n is large 
enough, we have 


\g{u, i>)| ^ anil). 
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Hence, the double integral 

iJJ- 




'o)dudv 


extended over the region outside of the square \u\ S I, kl S I is less than 




2x2, 


e ^ rdr < 


hn^ 




in absolute value. The same double integral extended over the square 
|ul S I, ^ I is less than 

TT 


in absolute value. Thus, referring to (6) 


dm 

dtdr 


- U-S 


A2 1 

— -j- (2i2 ^,2) _ _ (y 2 2rnUtJ + 2J 2) 


e-ntu+Tv,^^y + R 


and 


hn^ 


Now 


~(u2 + tl2) 


\R\ < ^ 


l_^(„2+,2). |x|<l 


and 

Jh^ 

16x2 


J oo n 00 

I 

- 00 ^ — 00 


^(uH-2rnuv'{-v^) _|_ p^'^dudv 




< 




Hence 


and 


4x(l — 4x(l 

dm 1 r " r ” 

'Iirr' = I e“K«2H-2rnU2;+t>2)g~i(fu-H-t;)^^^y + J?' 

dtdr 4xV_ooJ-co 






|2?1 < ^ 

By transformation to new variables 


+ 




4x(l 


^ = u + rnV; I? = vy/T 
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the foregoing double integral becomes 

1 


so that finally 


27rVT 


=6 2(1 — rn 2) 


(i2-2r„(r+T2) 


1 


dm 


dtdr 27r \/ 1 “ 




Integrating this expression with respect to t and r between limits toj ti 
and To, Ti, we get: 


( 7 ) 


n = 


27rV'l 


HJ7 

1 rlJto Jr 


” -57T^-r«(*“-2r„(r+r2) 

g 2(1 -r»2) ^ p 


where 

(8) |p| < {h - io)(n - to) 


(W)2 




fitnil) + 




+ 




Airil - a^)^j 


Hence combining inequality (5) with (7) and (8), 


P = 


27rVl 


HJ‘T 


g 2(1 -rn2)^^ 2r„<T+r ] 


where 
1A„| < 


2 -[ — io)(Ti — To) 

TT 


(hir- 

<..® + r + 

+ «i - k)(r, - ..)l + iV2 + 

47r(l — 

Considering fo, ifij ro, ri as variable and denoting an arbitrarily large 
number by L, we shall assume at first that the rectangle D 

to S S S ti, To ^ cr g n 

is completely* contained in the square Q: 

is| ^ L, |(r| ^ L. 

Then, taking h — l~^ we shall have 

lA.i < (2 + ^)aM + ie~i(-^r‘‘+iL<] + 3 - 
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Given an arbitrary positive number e, we take I so large as to have 
/ o __3 V 7 27—1 1 

Ze ¥ -4=Z 2 + + V2Z 2 + —±L 1 -3 < 

/ :r(l-« 2)2 

After that, since a„(/) 0 as oo (for a fixed Z) we can find a number 

?^o(€) so that 

for n > noie). Finally, we shall have 

1 A„1 < e 

as soon as n > no{e ) ; that is, An tends to 0 uniformly in any rectangle D 
contained in the square Q with an arbitrarily large side 2L. 

c. To prove that An tends to 0 uniformly no matter what are to, h; 
TO, Ti we observe that the integral 


2 t \/1 


— ff« 

1 - riJ J 


2(1 


- 2rntT +r2) 

dtdr 


extended over the area outside of Q becomes infinitesimal as L — > . 
Accordingly, we take L so large as to make this integral <6/2 (no matter 
what n is) and in addition to have < 6/4. The number L selected 
according to these requirements will be kept fixed. 

Let D' represent that part of D which is inside Q, the remaining part or 
parts (if there are any) being D". Let P' and P" denote the probabilities 
that the point a, <t shall be contained in D' or P", respectively. Also, 
let J' and /" be the integrals 



2(1 




extended over P' and P", respectively. By what has been proved, given 
6 > 0 a number no(e) can be found so that 


forn > no(€). Now 


IP' - J'l < € 


P = P'+P"; / = + 

whence 

|P J1 < 6 + P" + J" 

for n > rioie). Since by Tshebysheff's lemma (Chap. X, Sec. 1) the 
probability of either one of the inequalities 
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Is] > L or \<r\ > L 
is less than 1/L, we shall have 


P" 



Also, 

J <2> 

whence 

\P-J\< 2e 

for n > noie) ; that is, the difference 


P - 



1 

2(1 -rn*) 




tends to 0 uniformly, no matter what <o, ti; ro, ti are. 

Finally, the last statement of the theorem appears as almost evident 
and does not require an elaborate proof. 

11, The theorem just proved concerns the asymptotic behavior of 
the probability P of simultaneous inequalities 


to ^ s < h; tq ^ a < Ti 


which, due to the definition of s and o-, are equivalent to the inequalities 
to\^Wn ^ + * * " + rCn < 

ro\/C^ ^ + 2/2 + • * • + '^/n < ri\/C^n* 


From the geometrical standpoint the above domain of s, o* is a rec- 
tangle. But the theorem can be extended to the case of any given 
domain R for the point 5, cr. It is hardly necessary to enter into details 
of the proof based on the definition of a double integral. It suffices to 
state the theorem itself: 

Fundamental Theorem. The prohahility for the point ( 5 , a) to be 
located in a given domain R can he represented^ for large n, by the integral 



1 

2(1 -r»2) 


(<2-2r««r4-T2) 


dtdr 


extended over R, with an error which tends uniformly to 0 as n becomes 
infinite, provided 


Wn 0, 


Vn 0, 
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while for all n 

\rn\ < a < 1, 

In less precise terms we may say that under very general conditions 
the probability distribution of the components of a vector which is the 
sum of a great many independent vectors will be nearly normal. 

The first rigorous proof of the limit theorem for sums of independent 
vectors was published by S. Bernstein in 1926. Like the proof developed 
here it proceeds on the same lines as Liapounoff^s proof for sums of 
independent variables. Moreover, Bernstein has shown that the limit 
theorem may hold even in case of dependent vectors when certain addi- 
tional conditions are fulfilled. 

12 . A good illustration of the fundamental theorem is afforded by 
series of independent trials with three alternatives, E, F, G. For the 

sake of simplicity we shall assume that probabilities of J?, F, G are 

p, q, T in all trials. Naturally 

p + q + r = 1. 

In the usual way, we associate with these trials triads of variables 
Vi, Zi (i = 1, 2, 3, ... ) 

so that 

Xi = 1 or 0 according as E occurs or fails at the ith trial; 

yi = 1 or 0 according as F occurs or fails at the ith trial; 

Zi = 1 or 0 according as (? occurs or fails at the ith trial. 

Evidently 

E{xi) = E{xf) = p 

E{y,) = E{y\) = q 

so that vectors Vi with components 

= Xi - p, 7]i ^ yi ^ q 

have their means = 0. The independence of trials involves the inde- 
pendence of vectors Vi, V 2 , . . . Vn. Hence we can apply the preceding 
considerations to the vector 

V = Vi + V2 + • ‘ + v„ 

with the components 

X = + • * • + $71 

F = )7i + ^2 + * • * + ^7n. 

We have 

Bn = S(X2) = np(l -p); Cn = E{Y^)^nq{l-q). 
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Moreover, 


and 


E{^ir)i) = E{xiyi) — pq = —pq 


E{XY) = rn-y/Wn-VC'n = —npq 

whence 


-\Jpq{l - p){l - q) 

The quantities denoted by fh, Qk in Sec. 9 are in our case 

fk == E\^k\^ = pO- - pY + (1 - p)p^ 

Qk = E\rtk\^ = q{l - qY + (1 - q)q^. 

Hence 

_ y(l - pY + (1 - P)P^ ^ g(l - + (1 - q)q^ 

n^pi(l — p)i ’ n^g*(l — qY ’ 

and the conditions 

— » 0 , rjn 0 

are satisfied. The fundamental theorem, therefore, can be applied. 
If k, I, m are the respective frequencies of events E, F,Gmn trials, the 
quantities X and Y represent the discrepancies 

X = & — np, fjL — I — nq. 

Introducing the third discrepancy 


we shall have 


V = m — nr 


X + /X + j/ = 0 


so that V is determined when X and /x are given. The last two quantities, 
however, may have various values depending on chance. Concerning 
them the following statement follows from the fundamental theorem: 

Theorem. The probability that discrepancies X, p in n trials shall 
simultaneously satisfy the inequalities 

a^'s/n < X < ai's/n] fia^/n < p < pi\/n 


tends uniformly, with indefinitely increasing n, to the limit 


1 rcct m 

27r V pqr jao 
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where, to have symmetrical notation, y is a variable defined by 

a + + 7 = 0. 

On account of symmetry, perfectly similar statements can be made in 
regard to any two pairs of discrepancies X, /z, v. 

Since the fundamental theorem and its proof can be extended without 
any diiEculty to vectors of more than two dimensions, we shall have 
in the case of trials with more than three alternatives a result perfectly 
analogous to the last theorem. 

Theorem. Each of n independent trials admits of k alternatives Ei, 
Ei, , , . Ek the probabilities and the frequencies of which respectively are 
pi, p 2 , . . . pk and mi, m 2 , . . . mk. The probability that the discrep- 
ancies mi — npiii = 1, 2, . . . fc — 1) should satisfy simultaneously the 
inequalities 

ai^/n < mi — npi < fii\^n 

tends uniformly, with indefinitely increasing n, to the limit 


k 

1 


k-1 

( 2 ir) 2 VpiPa 


• ■ • I e 1 


dtidt2 


dtk^ 


where 

tk — —(^1+^2+ ‘ • * + tk-.f)» 

From this theorem, by resorting to the definition of a multiple integral, 
we may deduce an important corollary: Let Pn denote the probability of the 
inequality 


imi 


npiY ^ (m 2 


np2)^ 


+ 


+ 


up I np 2 

Then, as n tends to infinity Pn tends to the limit 


{mk — npkY 
npk 






^ 2\pl 


dtidt2 


dt, 




k-1 

( 2 ir) 2 1 / ■ ■ ■ Pk 

where the integration is extended over the {k — 1) dimensional ellipsoid 
Pi Vi Pk~ 

It is easy to see that the determinant of the quadratic form ip in 
(fc — 1) variables is {pip 2 • • * Hence, by a proper linear trans- 

formation the above integral reduces to 


f r . . . . . . 

(2x) 2 


dvh-.i 
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the domain of integration being + vl + • ■ ■ + t)|_i ^ But 
this multiple integral, as will be shown in Chap. XVI, Sec. 1, can be 
reduced to a simple integral 


k-l 
2r 2 


Thus 


/j ^ I e 2 u^-^du. 


limP„ = - iTs- 

2 2 r 


1 

7^ rr I e 2 u^-^du. 

(^T 


The probability Qn = 1 — Pn of the opposite inequality 

Ml ~ 2ipi)2 (ma - rapa)2 , . . , (m^ — npk)^ 2 

^ npi nps nph ^ 


tends to the limit 


fc-3 

2 2 r 


1 f“ 4 




and for large n we have an approximate formula 


Qn — h — 


3 

2 2 ri 


e-^) 


111 




e 2 


but the degree of approximation remains unknown. In practice, to 
test whether the observed deviations of frequencies from their expected 
values are significant, the value of the sum (A), say is found; then 
by the above approximate formula the probability that the sum (A) will 
be greater than x^ is computed. If this probability is very small, then 
the obtained system of deviations is significantly different from what 
could be expected as a result of chance alone. The lack of information 
as to the error incurred by using an approximate expression of Qn renders 
the application of this ^'x^Aest'^ devised by Pearson somewhat dubious. 


Hypothetical Explanation of Empirically Verified Cases of 

Normal Correlation 

13. Normal distribution in two dimensions plays an important part 
in target practice. It is generally assumed on the basis of varied evidence 
collected in actual target practice that points of a target hit by projectiles 
are scattered in a manner suggesting normal distribution. By referring 
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points hit by projectiles to a fixed coordinate system on the target, it is 
possible from their coordinates to find approximately (provided the 
number of shots is large) the elements of ellipses of equal probability. 
Dividing the surface of the target into regions of equal probabilities as 
described in Sec. 4, and counting the actual number of hits in each 
region, the resulting numbers in many reported instances are nearly 
equal. That and the agreement with other criteria are generally con- 
sidered as evidence in favor of assuming the probability in target 
practice to be normally distributed. 

Two-dimensional normal distribution or normal correlation has been 
found to exist between measurable attributes, such as the length of the 
body and weight of living organisms. Attributes like statures of parents 
and their descendants, according to Galton, again show evidence of 
normal correlation. 

Facing such a variety of facts pointing to the existence of normal 
correlation, one is tempted to account for it by some more or less plausible 
hypothesis. It is generally assumed that deviations of two magnitudes 
from their mean values are caused by the combined action of a great 
many independent causes, each affecting both magnitudes in a very small 
degree. Clearly, the resulting deviations under such circumstances may 
be regarded as components of the sum of a great many independent 
vectors. Then, to explain the existence of normal correlation, reference 
is made to the fundamental theorem in Sec. 11. 

Problems for Solution 

1 . Let p denote the probability that two normally distributed variables (with 
means = 0) will have values of opposite signs. Show that between p and the corre- 
lation coefficient r the following relation holds: 


r — cos pTT. 


2. Variables x, y (with the means = 0) are normally distributed. Show that the 
probability for the point a:, y to be located in an ellipse 


X y 

O' I O'ia'2 (To 


is greater than the probability corresponding to any other domain of the same area. 

3- Three dice colored in white, red, and blue are tossed simultaneously n times. 
Let X and Y represent the total number of points on pairs : white, red and white, blue. 
Show that the probability of simultaneous inequalities 


7n + ^ 0 < X < 771 + iiV 7n -f roV^ < 7 < 7n + n 
tends to the limit 



~U^-tr+r^)dtdT 


as n 00. 
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4. Three dice, white, red, and blue, are tossed simultaneously n times. If h and I 
are frequencies of 10 points on pairs: white, red; red, blue; show that the probability 
of simultaneous inequalities 



tends to the limit 


11 Tri 

2^\/ 120 Jfo Jro 

as n — ^ . 

5. Two players, A and J5, take part in a game arranged as follows : Each time one 
ball is taken from an urn containing 8 white, 6 black, and 1 red ball; if this ball is 

white, A and B both gain $1 ; 
black, A loses $2, B loses $4; 
red, A gains $4, B gains $16. 

Let Sn and o-n be the sums gained by A and B after n games. Show that the probability 
of simultaneous inequalities 

Iq^/ ^ Sn ti\/ ^71) Tq'S/ 48W K OTn Ti\/ 4S>71 

for very large n will be approximately equal to 

^ %) U tJTQ 

Note that the probability of the inequality Sncrn < 0 is about 0.13 — not very small — 
so that it is not very unlikely that the luck will be with one player and against another. 

6. Concentric circles Ci, Cs, Cs, . . . in unlimited numbers are described about 
the origin 0. Points Pi, P2, P3, - . . are taken at random on these circles. Let R 
be the end point of the vector representing the sum of vectors OPi, OP2, OP 3, .... 
If ri, Tij rs, . . . are radii of Oi, C 2 , Czj . . . and the condition 


rl+rl + ‘ 

(r? 4- r- + • • • + rl)^ 


as n 00 


is fulfilled, show that the probability that R will lie within the circle described with the 
radius p about the origin will be very nearly equal to 


1 — 6 * * * +3*71.^ 


for large n. 
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CHAPTER XVI 


DISTRIBUTION OF CERTAIN FUNCTIONS OF NORMALLY 
DISTRIBUTED VARIABLES 


1. In modern statistics much emphasis is laid upon distributions of 
certain functions involving normally distributed variables. Such dis- 
tributions are considered as a basis for various ^Hests of significance^' 
for small samples, that is, when the number of observed data is small. 
Some of the most important cases of this kind will be considered in this 
chapter. 

Problem 1* Independent variables xi, X 2 , . • • Xn are normally 
distributed about their common mean == 0 with the same standard 
deviation cr. Find the distribution function of the sum of their squares 

s ^ x\+xl+ • • • +xl. 

Solution. The inequality 

a:? < t 

being equivalent to 

— < Xi < v^, 

the distribution function of x\ is 

1 [*Vi 1 !L -I 

Fi(t) = — = I e ^^^dx = — 7 = I e Hu for i ^ 0 
<rv2Tj-.v1 (rw^TTjo 

Fi{t) =0 for i < 0. 

Hence, the characteristic function of any one of the variables x\, 

. . . a^^is 


<rV27rJo 

and that of their sum 




Consequently, the distribution function of s is expressed by 


F(t) = C + 
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and it remains to transform this integral. To this end, imagine a variable 
distributed over the interval (0, + ) with the density 


n 



Its characteristic function is 


(,V5)-(i - il)-^ 

and since the distribution function is given a priori, we must have for 
4^0 



du = const. + 



Hence 


F{t) = const. + 


1 



1 


du. 


The constant must be == 0 since Fit) as well as the integral in the right 
member vanishes for t = 0. The final expression is therefore: 




t _ju_ 

e du for 


m = - 

(vV2)»r(^L- 

Fit) =0 for 4^0. 

The probability of the inequality 

+ A + ' ' ' + xl<t, 


t ^ 0 


on the other hand, can be expressed directly as a multiple integral 

" (crV2^)d J ' y ■ ■ ■ dx„ 

extended over the volume of the n-dimensional sphere jS 

+ a:| + • • • + a:* < 4. 

By equating both expressions of Fit), we obtain an important transforma- 
tion, 
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( 1 ) 


JJ • / 


e 2 <r 2 dXidX 2 


dXn = 


= r 


e du. 


+ xl) is an arbitrary function of 
u = xl + xl+ • • • + xl 


If Fixl + xl + 
the integral 

extended over the whole ^-dimensional space represents the mathematical 
expectation of F(u). On the other hand, the distribution function of 
u being known the same multiple integral will be equal to 


• . • + Xn ^ 

e 2 cr 2 


4 “ xl)dxidx2 


dXn 


1 r “ 

I 0 2a 


n — 2 

2 du. 


Taking in particular cr = 1, F(u) = we get the formula 

r r r * • • +^^)+av^t^+ — 

II * * • j e 2 dxidx 2 * * • dXn = 


(2) 


__ 7r2 r 


» -Uaui 


n-2 


U 2 dUj 


^*2 


which will be used later. 

2. Problem 2. Variables Xi, x^, . 
Denoting their arithmetic mean by 

Xl X2 


Xn are defined as in Prob. 1. 
“b x^ 


n 

find the distribution function of the sum 

S = (xi — sY + ix2 — sY ' + i^n — 

Solution, The probability of the inequality 

S < t 

is expressed by the multiple integral 
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20-2 


dxidx% ‘ * * dxn 


extended over the volume of the n-dimensional ellipsoid 


(rci - sy + (x2 — sy + ■ • ‘ + (xn - sy < t. 
Let 

Xi -- S = Uij X 2 s — Uzy Xn — 8 — Unj 

whence 

Ul “t* U2 -j- * • • -j- Un ” 0 

and 

ccf + + * * • u\ + ul+ • • • +ul + 


Taking Uiy U 2 j . . . Un^i, and s for new variables, we must first find the 
Jacobian J of Xi, 0 : 2 , .. . Xn with respect to Ui, W 2 , - . . Un-i^ 8, It is 



1 

1 

0 

0 • 

♦ 0 


1 

1 

0 

0 

. . . 0 


1 

0 

1 

0 • 

• 0 


1 

0 

1 

0 

• • • 0 

J = 

1 

0 

0 

1 • 

■ 0 

= 




• 



1 

0 

0 

0 ■ ■ 

• 1 


1 

0 

0 

0 

• • • 1 


1 

-1 

-1 

-1 • • 

• -1 


n 

0 

0 

0 

• • • 0 


In the new variables the expression for F{t) will be 


m = 


(as/ 2 t) 




ns® ni®4-W224- 
"2<r2 


+Un^ 


2<r2 


dsduiduz 


dUn—l 


and the domain of integration in the space of the new variables is defined 
by 

~ 00 < s < 00 

“1 + ^<2 + • • • + ul_i + (Ml + M2 + • • • + M„_i)2 < t. 

After performing the integration with respect to s, we get 
\/m C C C _ * ' ■_+««* 


F{t) =- ^ 

(<rV2^)- 

The quadratic form 




2<r® 


duidu2 


dUn-l. 


(p ul + ul+ • • • + + (U 1 + U 2 + • • ^ + Un^iy 

can be represented as a sum of the squares of (n — 1) linear forms in 
variables 1 / 1 , . . . Un-^u 

<P =vl + vl + • • - +i;Li- 

The Jacobian 
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d{vi, Vj, ■ . . Vtl-i) 
d(ui, U2, . . . Un-l) 

is the square root of the determinant of the form ip, which is the same 
as the determinant of linear forms 

1 dip ^ 

2 ^ + ■ ■ ■ + 

1 d 

2 ^ + ■ ■ ■ + 


1 dip 


2 dUn~l 

Now, in general 

p times 
1X1 • • • 1 


= + ^2 + • ‘ * + 2Un^i. 


= (X ~ 1)P-1(X + p - 1) 


llll • • • X| 

so that the determinant of is =n, whence 



divi, 

a(wi, 

V2y • 
U2j • 

• • t'n-l) 
l) 

= \/n 

and 






d(wi, 

U2, * ' 

^n— 0 

_ 1 


a(ai, 

V2y • ’ 

• Vn-l) 

Vn 

Therefore, taking Vi^ V 2 , . . 

. Vn-1 

for new variables. 


as follows 

Fit) = ; — • • • e ^^idv2 ■ 

where the integral is extended over the volume of the sphere 


• • * +Vn~.l^ 

e dvidv 2 • * • dvn^i 


vl + vl+'-'+ vl_i < L 

This multiple integral is exactly of the type considered in the preceding 
problem, and it can be reduced to a simple integral as follows 


/J ■J‘ 


• • • -f-gw-X^ 

dVidV2 • • • dVn^l 


T ^ 

— 1 


'*t u n — 3 

e 2 du. 



336 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XVI 
After substitution, the final expression of F{t) is 


m 


1 r 

F{t) =0 for f g 0. 


I ^ n-"3 

e 2*^% ^ du for i > 0 


3. Problem 3. Variables rri, x^, . * • Xn are defined as in Prob. 1. 
As in Prob. 2, we set 


rri + a:2 + 


+ 


n 


Ui == Xi - s; e == 1, 2, . . . 

and introduce the quantity 


juf + ul + ■ ■ 

■ +w| 

J n 



What is the distribution function of the ratio 


s 

€ 

or, which is the same, the probability F{t) of the inequality 

s < ^e? 

Solution. First, assuming t to be positive, let us find the probability 
^0) of the inequality 


s 


or 


^1 + ^2 + ' • • + 


< 


This probability can be presented in the form 


where the multiple integral 

j. 


e duidu 2 • • ■ dun-i 


in which 


Un — — (U\ + M2 -f- • • • -f- W„_i) 
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is extended over the domain 


^ Ul+Ul + ■ ■ • + -|- (Mj -|- ^2 + • • • + g 

Proceeding in exactly the same manner as in Prob. 2, we can transform 
■^(s) into 


extended over the sphere 

t'! + vl + 






20-2 


dvidvz 


dVn^l 




ns^ 

T 


in the space of the variables vi, . Vn-i^ For this multiple integral 

we can substitute a simple integral 


n — l ns^ 


n — 1 n — l s 


^ r 

n — l\Jo Jn — l\jo 






e 2or2^n-2^^ 


and thus reduce ^( 5 ) to the form 


n—l n—2 3 


n 2 






After substitution we can express 4>{t) as a repeated integral 

n 

2n2 


^(t) = 


-^(cV2yT{~^ 

The derivative of is 


J '* 00 ^ 

e I 

0 Jo 


ns2 _n|2 


4>'{t) = 


2nH-^ 


V^(<rV5)’‘r(^) 


sj; 


, 2o-*V ^tVon-: 


Ids 


<i) 




(1 + 1^) 


MIS 
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whence 


4,{t) = C- 


Now 


<i) 1 

f' dz 


n 

" “(1 + 

f “ dz 


I ” 



so that C — 1 and 

4>{t) = 1 - 




— l\ J- 00 


dz 


(1 + 


Such is the probability of the inequality 

s ^ U. 

The probability F(S) of the inequality 

s < te 

will be 1 — or 


m = 


r! 




r,L 


dz 


(1 + ^ 2)2 


•but this is established only for positive t. However, this result holds 
for negative t as well. For t being negative = — r the inequality 


s < — r€ 


is entirely equivalent to 


> T€ 


and its probability is evidently 


F(—t) = = 1 






+ z") 2dz. 
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But 



+ Hz == 1 


which permits of writing the preceding expression for F(~t) as follows* 



Thus, no matter whether t is positive or negative, the distribution func- 
tion of the ratio 


5 

€ 

or the probability of the inequality 

S < 

is given by 



The distribution of the quotient s/e was discovered by a British 
statistician who wrote under the pseudonym “Student,^' and it is com- 
monly referred to as Student’s distribution.” The first rigorous proof 
was published by R. A. Fisher. 

4. Problem 4. Variables x, y are in normal correlation. A sample of 
n corresponding pairs, Xi, yi, X 2 , y^; , . . yn is taken and the ^^correla- 
tion coefl&cient of the sample” is found by the formula 

Zjxi - s){yi - s') 

^ - sY ■ - s')" 

where, for the sake of abbreviation, 

ajj 4- X2 + * * * + 2/i + 2/2 + * • * + l/w 

S = — ) S = ^ 

n n 
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Find the distribution fxinction of p, that is, the probability P of the 
inequality p < P for a given P(— 1 < B < 1). 

Solution. Since the expression of p is homogeneous of degree 0 in 
xi,Xi, . . . Xn’,yi,y^, ■ ■ . we can assume <ri = vz = 1. Also without 
loss of generality the expectations of x and y may be supposed =0. 
Denoting by r the correlation coefficient of x and y, the density of proba- 
bility in the two-dimensional distribution will be: 


2x(l - 


e 2(1 -r!) 


(x^-\-y^~2rxy) 


Hence the required probability will be expressed by the multiple integral 


(27r)"(l - 

extended over the 2n-dimensional domain 

(3) Zixi — s)(yi — s') < Ry/'Z{xi — s)^ • S( 2 /i — s'Y 
and 

(4) = Sa;? -h Sy? - 2rY.xiyi. 

Replacing Xi, yi(i = 1, 2, . . . n), respectively, by Vl — rHi, a/I — r^yt, 
we can write P thus : 


JJ -f 


2 ( 1 - 


'‘^(IXi 


dx^dyi 


dyn 


P = 


(1 


(2^ 


■ ■ ■ J*® * ■ ■ d-Xndyi ■ ■ ■ dy^ 


while (3) and (4) still hold but with the new notation for the variables. 
Let us set now 


Xi — s = Uij yi — s' = Vi, 

then 

+ 1^2 + * • * + = 0, z;i + 2^2 + * ' • + 2^71 = 0. 

Introducing s, s'; Zil, ^ 2 , . . . \vi,V 2 j . . . t^n-i as new variables, we 
find as in Sec. 2 


P = 


where 


- 

(2. 


- f r C 

j^)7i II * ’ * I ^ ^ dsds' du\ • • * dun-~^idvi * * * dvn-^i 


= ns^ + m'^ - 2nrss' + 'Luf + - 2r'I,UiVi 
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and the domain of integration is defined by the inequalities 

— CO < s < 00 ; — oo<s'<oo 

2uiVi < R\/2u? • 

Now by the same linear transformation the quadratic forms Zuf, 
Zvf (each containing n — I independent variables) can be transformed 
into 

n—l n— 1 

t=l i — 1 

at the same time 

n n — l 

i = l i = l 

Proceeding as in Sec. 2 and noting that 



we find that 

_i 

e ^dwi • • • dwn-idzi * • • dzn^i 

where 

= 'Ey;! + - 2rZWiZi 

and the domain of integration in the space of 27i — 2 dimensions is defined 
by 

ZwiZi < BV Zw^ • Zzl 

We shall integrate now in regard to variables Zi, Z 2 , . . . Zn-i for a fixed 
system of values Wi, W 2 j . . . Wn^i. To this end we use an orthogonal 
transformation 

+ Cl, 2^2 + * * * + 

^2 = C2,ll^l + 02,2^2 4- * ' * + C2.n-.irn-l 


Zn^l == Cn-iafl + <Jn-l,2?2 * * ' + Cn~l,n-lfn-l 

in which the elements of the first column are 

__ Wi _ 
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Defining . . . ^n-i by 

Wi = Cl, ill + Ci, 2|2 -f- * ’ * + Ci,n-l|»-l 
= C2,l|l + C2,2|2 + • * ' + C2,n-l|7i-l 


we shall have |i == tc, I 2 = * * * = l«-i = 0. By the properties of 
orthogonal transformations 

'Lz! == 2f|, 


SO that for a fixed system of values ici, ^^?2, • • • ' 2 ^»-i the domain of 
integration in the space of variables f 1, f 2, . • • tn-i will be 

(5) fi < bVWI 

Thus we must first evaluate the integral 


J = JJ . . . • • ‘ • • • dfn--l. 

If f 1 < 0 no restriction is imposed upon ^ 2 , .. . fn-i; if fi > 0, then 


?2 + * ‘ * + fLl 



Consequently the result of integration in regard to f 2 , • . . ?n-i can be 
presented thus: 


r2^2 1 




+ rn-l2) 


• ‘ • dfn-l 


where the inner integral is extended over the domain 



and c is a constant. Making use of formula (1), Sec. 1, the expression of 
J reduces to 


n — 2 

. 2x 2 

J ce ^ 

This has to be multiplied by 
1 


r"e 

Jo Jo 


_e! 


(2x) 


— j(l — r®) *2 e 2^“’ 




and integrated over the whole space of the variables wi, Wi, . . . 
The resulting expression for P will be 
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P = const. — 


7 f f . . ■ 


dWn-1 


where 


M 




^ V2 


Now we differentiate in regard to R, reverse the order of integrations, 
and make use of formula (2), Sec, 1; the resulting value of dPJdR will 
then be expressed as a double integral 


^ n — 4 

^ = T 2(1 - r^) 2 (1 - jg2) 


LI f fe-l 

l\ Jo Jo 


+Rrtu 


(tuY~Htdu, 


or 


dP _ (1 - r^) 

dB TrVin - 2) 


^00 ^00 


Since 


- *)• 

In the double integral we make transformation to new variables rj 
defined by 

t = r] = tu. 

u 

The Jacobian of tj u in regard to rj, being we have 


^00 ^ oo ^ 1 

J. Jo 


5«s+«>)+ifr(« 


ir(n 


(tuY~Htdu = 5 


n. 




- 1) f”/ = r(n - 1) f” 

Jo Jo 




(c/i^ - Rr)--^^ 


and so, finally, 

dP _ 2 

51 


n — 1 n — 4 r* * 

(1 - r*) 2 (1 _ i22) 2 


‘X 


dt 


0 (ci^^ — Rry~ 


In case r = 0, that is, when the variables x, y are uncorrelated, we have 
a very simple expression of P: 
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P = 


felf 


^r(rL^) 

In case r 9 ^ 0 the integral 

J "* « 

0 ( 


(1 - p2) 2 dp. 


dt 


(cht — 

can still be found in finite form. We have, in fact, 


dt ^ 1 _ 

cht Bv \/l — 




~ + arc sin (Rr) 


whence 


X' 


dt 


y,— (n-~2) 


{cht — {n — 2) 

and so 


! dP-H' 


[1 - PV2]-i| 


jr + arc sin (Pr) 

Jt 




p 

where 




VlZ± ( r^r 


arc sin (rp) 




n — 1 

ip (n— 2) _ ^2^ 2 

^ "" Tr{n - 3)! 

When n is an even number, this integral appears in a very simple finite 
form, but in case of an odd n certain integrals of a rather complicated 
type appear. Besides, the behavior of P for somewhat large n cannot 
be easily grasped by using this integral expression for P. 

5, Fisher, who was first to discover the rigorous distribution of the 
correlation coefficient, called attention to the fact that, setting 

thz = 

the distribution of z will be nearly normal even for comparatively small 
values of n. Let us set thR = co, th^ = r; then P can be expressed thus: 

p — ^ ^ r ” chzdtdz 

^ J - 00 Jo {chtchzcht — sh^shz)^~^ 

Instead of t it is convenient to introduce a new variable r so that 

chtchzch^ — $hi$hz == r^hh{z — f). 
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Then 


where 


_ w — 2 p" /chz^i dz piT“4(l — tY-Ht 


cht) [ch{z - ^ 


pr 


^ + r) 

^ 2chzch^ ” 2cho)ch^ 


for all values of z under consideration. Now 


l 


~ ry-’^dr ^ V^T{n - 1 ) 


\/l pr 


and 


since 


X 


V ^(1 — t)” ^dr ^ “x/rrC^ ~ 1) 

0 Vl — pr ^ ■“ 

V~4(l — rY~^dT 


T(n - i) 


j: 


< j T”^(l ““ r)^ ^(1 + pr)dr 


;o Vl 

for 0 < p < 1 as can be easily verified. Consequently 


p _ (n — 2)r(n — 1) T" 

V^r(n -- f)“ « W ^ ' 

. Fi c^(^ + r) ^ 1. 

2chwcht 2?i — ij' 


0 < ^ < 1 . 


As to the integral in this formula, its approximate expression, omitting 
terms of the higher order, is: 





tH 

2n - 3 


Thus for somewhat large n the required value of P can be found with 
the help of a simple approximate formula. 

The various distributions dealt with in this chapter are undoubtedly 
of great value when applied to variables which have normal or nearly 
normal distribution. Whether they are always used legitimately can 
be doubted. At least the '^onus probandi” that the “populations” with 
which they deal are even approximately normal rests with the statisticians. 


1. Show that 


lim 

n— » 00 


n 

C 

Kr" 


Problems for Solution 


2 % ^ 1 , 2 
V 27r J - « 


du 


Htnt: Liapounoff's theorem and Prob. 1, page 332. 
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2. With the same assumptions and notations as in Prob. 3, page 336, show that the 
distribution function of the quotient 




; i = 1,2, . . . n 


F{t) = 




n — 4 


j. r (i IL.') ^ dr 

/ n-2 y-V^\ n-lj 


^ '\/n — 1 


V Ttin - i)r[^ 

F{t) =1 if t> \/n 1; Fit) =0 if t < - Vn “ 1. 


It is worthy of notice that forn =4 the distribution is uniform.^ 

3. In two series of observations, samples JCi, X 2 , . . . Xn and yi, 2 / 2 , .. . 2/»' from 
the same normally distributed population (or of the same normally distributed vari- 
able) are obtained. Denoting for brevity 


a = 


aji + ^2 + 


+ Xn , 2/1 4* 2/2 + • • • + 2/n' 

, g/ 

n' 




find the distribution function of the quotient ('^Student'’)* -^wa. 
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APPENDIX I 


1. Euler’s Summation Formula. Let f{x) be a function with a 
continuous derivative f(x) in an interval (a, h) where a and b > a are 
arbitrary real numbers. The notation 

n^b 

n >a 

will be used to designate the sum extended over all integers n which are 
>a and S b. It is an important problem to devise means for the approxi- 
mate evaluation of the above sum when it contains a considerable number 
of terms. 

Let [x]j as usual, denote the largest integer contained in a real number 
X, so that 

X = [x] + 6 

where 6, so-called ^Tractional part’’ of x, satisfies the inequalities 


0 S e <h 


Considered as functions of a continuous variable x, both [rr] and 6 have 
discontinuities for integral values of x. The function 


p(x) = ^--d = [x]— x + ^ 


is likewise discontinuous for integral values of x. Besides, it is a periodic 
function of x with the period 1 ; that is, we have 


p(x + 1) = p(x) 

for any real x. With this notation adopted we have the Showing 
important formula: 

n 

(1) Xf(n) = £f(x)dx + pih)f(b) - p(a)/(a) - 

n >o 

which is known as ^'Euler’s summation formula.” 

Proof. Let k be the least integer >a and I the greatest integer ^6. 
The sum in the left member of (1) is, by definition, 

m +f(k + i) + • • • +m 

and we must show that this is equal to the right member. To this end 
we write first 
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Jlpix)f{x)dx = jy(x)f'(x)dx + p{x)f {x)dx + ^ j[’'^^p(x)f(x)dx. 

j^k 

Next, since j is an integer, 

j'_^\(x)f'ix)dx = -x + ^fix)dx = JS3l±^l±l} + 

X y+i 

f(x)dx 

and 

2^ p(a:)/'(a:)dx = - 2 

j—k w=^+l 

On the other hand, 

p(x)f'{x)dx = - 1 - x + 0f (x)dx = - p(a)/(a) + 



p{x)f'{x)dx = - X + ^S'{P:)dx = + p(6)/(6) + J^/(a;)(i 2 , 

SO that finally 

{x)dx == —f{k) — f(k + 1) ” • • • — f{l) + 

+ Pib)m - p(a)Ka) + jy{x)dx; 

whence 

n 

'^fin) = ^J(x)dx + p(b)f(b) — p{a)f{a) — jy(x)f(x)dx, 

n>a 

which completes the proof of Euler’s formula. 

Corollary 1, The integral 

fjpmz = aix) 

represents a continuous and periodic function of x with the period 1, For 
cr(a; + 1) — cr{x) = piz)dz = J^p{z)dz = J^^(i — z)dz == 0. 

If 0 ^ a; ^1, 
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■ /. G - *)* “ 

and in general 


where 0 is a fractional part of x. Hence, for every real x 

0 g a(x) ^ 

Supposing that/"(^) exists and is continuous in (a, b) and integrating by 
parts, we get 

£pix)f(x)dx = <r(6)/'(6) - <r{a)f(a) - £<7ix)r{x)dx, 
which leads to another form of Euler’s formula: 

n 

%f(n) = f%)dx + p(b)f ih) - p(a)/(a) - cr(?))/'(6) + 

n >a 

+ a(a)/'(a) + f\(x)rix)dx. 

J a 


Corollary 2. If f(x) is defined for all a: ^ a and possesses a continuous 
derivative throughout the interval (a, + «> ) ; if, besides, the integral 

f^“‘p(x)f'(x)dx 

exists, then for a variable limit b we have 

n Sb 

(2) Xf{n) = C + ffQ,)db + p(6)/(6) + f”pix)fix)dx 

n >a 

where C is a constant with respect to b. 

It suffices to substitute for 

rp(pdf(p^)dx 

^ Clr 

the difference 

f/p(x)f(x)dx - £’°p(x)f(x)dx 

and separate the terms depending upon b from those involving a. 

2, Stirling’s Formtila. Factorials increase with extreme rapidity 
and their exact computation soon becomes practically impossible. The 
question then naturally arises of finding a convenient approximate 
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expression for large factorials, which question is answered by a celebrated 
formula usually known as “Stirling's formula/^ although, in the main, 
it was established by de Moivre in connection with problems on proba- 
bility. De Moivre did not establish the relation to the number 

T = 3.14159 . . . 

of the constant involved in his formula; it was done by Stirling. 

In formula (2) it suffices to take a = f(x) = log x, and replace 6 
by an arbitrary integer n to arrive at the remarkable expression 

log (1 ■ 2 • 3 ■ • • n) = C + (n + ^\ogn - n + ^ — 


where C is a constant. For the sake of brevity we shall set 



Jn ^ 

Now 

r *p(x)dx 


‘’^+^pix)dx , 

+ . . 

J 

In ^ c 

)n X Jr 


and 




C'‘+Yx)dz 

Jk X 

11 

_ C*p(u)du 

Jo ^ 

C^p(u)du _ 
fi u + k ~ 


= r^ (i y)du r^ (| ~ u)du _ 1 (1 — 2u)Hu 

Jo u-{-k "^Ji u + k 2jo {k-\-u){k + l — u) 

Hence 

«(n) = I J/d - 2uyF„iu)du 

where 

PO 

Fr.(u) = 2(fc+.M)(fc + l -u) 

h^n 

Since 

(lb + w)(fc + 1 — m) = h{k + 1) w — vP', 
it follo-ws that for 0 < w < 

d + tt)(fc + l -u) >k{k + l) 

(k + u){k + 1 - w)< (fc + 1)2 < (k + i){k + 1). 

Thus for 0 < u < }^ 
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< Sari ' 


k = n 


ik(k + 1) n 


F„(w) > 2 


k — n 


{k + |)(fc +1) n + I 


Making use of these limits, we find that 

" m 

"« > 2¥TlX'‘‘ ~ * i2(.V i)' 

and consequently can set 

1 

01 (n) = 

where 


Accordingly 

log (1 • 2 . 3 


12(n + e) 
0 < 0 < |. 


n) = C + 


(’■+0 


log ~ n + 


12(n + B) 


The constant C depends in a remarkable way on the number tt. 
To show this we start from the well-known expression for tt due to Wallis: 


5 - (l 


2 2 4 4 
3'3'5 


2n 


2n 


i) 


2n — 1 2n -f 
which follows from the infinite product 

by taking x = t/ 2. Since 


n- 


2 2 4 4 
r3‘3‘5 


2n 


2n 


2n — 1 2u -{- 
we get from Wallis' formula 

On the other hand, 


2.4- 6 • 
1 [l • 3 • 5 • ■ • 


• 2n 

(2n - 1) 


2n + 1 


• 2n 


1 1 


1 - 3-5 - • ■ (2n-l)^ 


2-4-6---2n = 2’‘-l-2-3---'n 
1.3-5 ••• (2n-l) l-2-3..-2n 


2" • 1 • 2 • 3 • 


n 
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so that 


(22“(1 . 2 • 3 ■ 

• • n)^ 

\ 1-2-3 • • 

• 2n 


Vtt = hm I ^ ^ 2 
or, taking logarithms 

log \/r = lim [2?^ log 2 + 2 log (1 • 2 • 3 • • • n) — 

— log (1 • 2 • 3 • • • 2n) ~ I log n ] 

But, neglecting infinitesimals, 

log (1 • 2 • 3 • • • n) = C + (^ + i) log n — n 
log (1 • 2 • 3 • • • 2n) = C + (2n + |) log 2n 2n 

whence 

lim [2n log 2+21og(l‘2-3--*n) — 

— log (1 • 2 • 3 • • • 2n) ““ I log n] = C ”” i log 2. 

Thus 

logV TT = C — i log 2, C = log \/27r 

and finally 

(3) log (1 • 2 * 3 • • • n) = log '\/2nr + log n - n + 


'^I2{n + ey 

This is equivalent to two inequalities 

^2'irn n'^e~^ 

which show that for indefinitely increasing n 

lim == = 1. 

V 2Trn 

This result is commonly known as Stirling's formula. 
For a finite n we have 


0<»<2 


where 


The expression 


1 • 2 • 3 • • • n = \/2TnnPe'^ • 


i2(« + i)^ < li^' 
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is thus an approximate value of the factorial 1 • 2 • 3 • • • n for large n 
in the sense that the ratio of both is near to 1 ; that is, the relative error is 
small. On the contrary, the absolute error will be arbitrarily large for 
large n, but this is irrelevant when Stirling's approximation is applied 
to quotients of factorials. 

In this connection it is useful to derive two further inequalities. 

Let m < n; we have, then, 


^ can-— 1 


- F.(u) = 2 


k — m 


{k + u)ik + 1 -u)’ 


and further, supposing 0 < u < 

k^n — 1 

F^{u) - Fn{u) < 2 


& = n — 1 


k{k + 1) w. n 


Fm{u) - F„(it) > 2 


1 


k==m 


(jfc i) (^ + -f) w +* i ^ + I 


Hence, 


«(m) - a,(n) < ^ «(m) - a,(n) > ~ l2(^) 

and, if Z is a third arbitrary positive integer, 

“(^) + > 12(m +I) + 12(1 +T) ■ 12(nTl)' 

3. Some Definite Integrals. The value of the important definite 
integral 


/; 


e-^^dt 


can be found in various ways. One of the simplest is the following: Let 
■ 

in general where n is an arbitrary integer ^ 0. Integrating by parts one 
can easily establish the recurrence relation 

1 


Jn = 




2 
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whence 


, 1 . 3 • 5 • • • (2m - 1) ^ 

Jam = Jo 


1-2-3 


m 


^2m+l — 


On the other hand, 

+ 2X/„ + - Jq" e-n-Kt + \ydl, 

which shows that 

J n+J + 2\Jn + > 0 

for all real X. Hence, the roots of the polynomial in the left member are 
imaginary, and this implies 

Ji < J n^lJ n~l‘ 

Taking n — 2m and n == 2m + 1 and using the preceding expression 
for J 2 m and J 2 m+i, we find 

2-4-6---2m 1 ^ T ^ 2-4-6*--2m 1 

1 . 3 . 5 . . . (2m -- 1) ^ ^ 1 • 3 . 5 . . . (2m ~ 1) 

But 

2 4 • 6 • • • 2m 1 /- 

mi^ 1 ■ 3 • 5 • . • (2m - 1) ~ 

hence 

Jo — 

Here substituting ^ = \/ aUj where a is a positive parameter, we get 

J. - Wi- 

As a generalization of the last integral we may consider the following one: 

V = cos budu. 

The simplest way to find the value of this integral is to take the derivative 

6"^^' sin hu • udu 

and transform the right member by partial integration. The result is 
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dV 

db 


^Of 


V 


or 


diVe^^) = 0 , 


6* 


whence 

V = Ce^^. 

To determine the constant C, take 6 =0; then 

1 


so that finally 


0 . - I'e-du - 


I 


g~au 2 QQQ hudu 




The equivalent form of this integral is as follows: 


r 


cos budu 


C“ Hr -- 

= I = J-e 



APPENDIX II 

METHOD OF MOMENTS AND ITS APPLICATIONS 


1, Introductory Remarks. To prove the fundamental limit theorem 
Tshebysheff devised an ingenious method, known as the method of 
moments,’^ which later was completed and simplified by one of the most 
prominent among Tshebysheff’s disciples, the late Markoiff. The 
simplicity and elegance inherent in this method of moments make it 
advisable to present in this Appendix a brief exposition of it. 

The distribution of a mass spread over a given interval (a, b) may be 
characterized by a never decreasing function <p{x)j defined in (a, b) 
and varying from (p(a) = 0 to <p(b) = mo, where mo is the total mass con- 
tained in (a, b). Since <p(x) is never decreasing, for any particular point 
Xqj both the limits 

lim <p(xo — e) = (p(xQ — 0) 
lim <p{xo + €) = <p{xo + 0) 

exist when a positive number e tends to 0. Evidently 

(P(Xq — 0) ^ (p(Xo) ^ <p(xo + 0). 

If 

<p(xo - 0) = (p(xo + 0) = (p(xo), 
then xo is a “point of continuity’^ of ^(x). In case 

^0 + 0) > <p(xo — 0), 

Xo is a point of discontinuity of <p(x), and the positive difference 

<p(xo + 0) ~ (p(xo — 0) 

may be considered as a mass concentrated at the point Xo. In all cases 
^(xo — 0) is the total mass on the segment (a, xo) excluding the end point 
Xo, whereas + 0) is the mass spread over the same segment including 
the point xo. 

The points of discontinuity, if there are any, form an enumerable set, 
whence it follows that in any part of the interval (a, b) there are points of 
continuity. 

If for any sufficiently small positive e 

<p(xo + e) > <p(xo — e). 


Xo is called a “point of increase” of <p(x). There is at least one point of 
increase and there might be infinitely many. For instance, if 
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<p{x) =0 for a ^ X S c 

(fix) = mo for c < X Sb, 

then c is the only point of increase. On the other hand, for 

/ s X — a 

<p{x) = moT 

0 — (X 

every point of the interval (a, h) is a point of increase. In case of a 
finite number of points of increase the whole mass is concentrated in 
these points and the distribution function <p{x) is a step function with a 
finite number of steps. 

Stieltjes^ integrals 

J^d(p(x) = mo, J\d(p{x) == mi, • • • J^x^d<p(x) = m,- 

represent respectively the whole mass mo and its moments about the 
origin of the order 1, 2, . . . When the distribution function (p(x) 
is given, moments mo, mi, m 2 , . . . rrii (provided they exist) are deter- 
mined. If, however, these moments are given and are known to originate 
in a certain distribution of a mass over (a, 6), the question may be raised 
with what error the mass spread over an interval (a, x) can be determined 
by these data? In other words, given mo, mi, m 2 , . . . m^, what are the 
precise upper and low'-er bounds of a mass spread over an interval (a, x) ? 
Such is the question raised by Tshebysheff in a short but important article 
“Sur les valeurs limites des integrales” (1874).^ The results contained 
in this article, including very remarkable inequalities which indeed are of 
fundamental importance, are given without proof. The first proof of 
these results and the complete solution of the question raised by Tsheby- 
sheff was given by Markoff in his eminent thesis ^^On some applications 
of algebraic continued fractions’’ (St. Petersburg, 1884), written in 
Russian and therefore comparatively little known. 

Suppose that pi is the limit of the error with which we can evaluate the 
mass belonging to the interval (a, x) or, which is almost the same, the 
value of ip{x)y when moments mo, mi, m 2 , . . . m£ ai^ given. If, with i 
tending to infinity, pi tends to 0 for any given x, then the distribution 
function (p{x) will be completely determined by giving all the moments 

mo, mi, m 2 , .... 

One case of this kind, that in which 

, 1 . 3 * 5 . • . (2fc - 1) 

mo = 1, m%h — 2* ’ = U 

1 Jour. UovmUe, Ser. 2, T. XIX, 1874. 
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was considered by Tshebysheff in a later paper, ^‘Sur deux theoremes 
relatifs aux probabilit6s’^ (1887)^ devoted to the application of his 
method to the proof of the limit theorem under certain rather general 
conditions. The success of this proof is due to the fact that moments, 
as given above, uniquely determine the normal distribution 

1 

<p{x) = I 

V7rJ-«, 

of the mass 1 over the infinite interval ( — oo, +oo). 

After these preliminary remarks and before proceeding to an orderly 
exposition of the method of moments, it is advisable to devote a few pages 
to continued fractions associated with power series, for continued frac- 
tions are the natural tools in questions of the kind we shall consider. 

2. Continued Fractions Associated with Power Series. Let 

=^' + ^! + ^!+ • • • ; (^1^0) 

be a power series arranged according to decreasing powers of z where the 
smallest exponent oli is positive. We consider this power series from a 
purely formal point of view merely as a means to form a sequence of 
rational fractions 

_L Ai, Ai Aij 

and we need not be concerned about its convergence. 

Evidently l/<l>{z) can again be expanded into power series, arranged 
according to decreasing powers of z. Let its integral part, containing 
non-negative powers of Zj be denoted by qi{z)^ and let the fractional part 

* 

containing negative powers of z, be denoted by —<^ 1 ( 21 ), so that 

~ = 3i(2) - 4>x{z). 

In the same way 

1 

^i{z) 

can be represented thus: 

^ Oeuvres completes de P. L. Tshebysheff, Tome 2, p. 482. 
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where q%{z) is a polynomial and 


^ zy^ ‘ zy^ 


a power series containing only negative powers of z. Further, we shall 
have 


^ = qz{z) - 4>s{z) 

with a certain polynomial q^iz) and a power series 
^ 3 (^) + • • • 

containing negative powers of z, and so on. Thus we are led to consider a 
continued fraction (finite or infinite) 


associated with <i)(z) in the sense that the formal expansion of 



Qi - 


into a power series will reproduce exactly 4>(z). The continued fraction 
(1) is again considered from a purely formal standpoint as a mere abbre- 
viation of the sequence of its convergents 


Pi ^ 1. 

Pz ^ 

1 1 • h-l 1 

Qi Qi' 

Qi 

gi--’ Qi Si-- 1 

q, q, - - 

The polynomials 


Pi, Pi, Pi, ■ ■ ■ 

Qi, Qi, Qi, • ■ ■ 

can be found step by step 

by the recurrence relations 

(2) 

Pi - qiPi-i - Pi-i\i = 2, 3, 4, . . . 

Qi = 


Pi = 1, 

Po = 0 


Qi = 21, 

Qo = 1 


! 
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from which the following identical relation follows: 

(3) Pi{z)Qi^.{z) - Qi{z)Pi-iiz) = 1, 

showing that all fractions 

Pi(^) 

Qi{z) 

are irreducible. Evidently degrees of consecutive denominators of 
convergents form an increasing sequence and the degree of Qi(z) is at 
least i. Since 



we can write 


ftfa-fl ““ Pj-l 

Qtfe+1 — (l>i+i(z)) — Qi^i 


^ Pi+i — Pi<j>i^i(z) 
Qi+1 — Qi<l>i-{-l{z) 


r / \ _ Pj+l Pi<tH-hl{z) 
Qi^i - Qick^iiz) 


in the sense that the formal development of the right-hand member is 
identical with (j)(z). By virtue of relation (3) 


Qi 


The degree of Qi being X,- and that of Qt+i being Xt+i, the expansion of 


QiCQi+i Qt<^t+i) 

in a series of descending powers of z begins with the power 
Hence, 


<p(z) - 


Qi 


M 

jgX»+X»+i 


+ • • • 


and, since Xf+i ^ Xj + 1, the expansion of 



begins with a term of the order 2\i + 1 in 1/z at least. This property 
characterizes the convergents Pi/Qi completely. For let P/Q be a 
rational fraction whose denominator is of the nth degree and such that 
in the expansion of 


<i>{z) - 


P 

Q 
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the lowest term is of the order 2n + 1 in l/z at least. Then P /Q coincides 
with one of the convergents to the continued fraction (1). Let i be 
determined by the condition 


X,- g n < X,+i. 


Then 


<f>(z) - 


Pi 

Qi 


M 

0\»‘4‘X»+i 


+ ■ ■ • 




P _ N 
Q 2^"+! 


whence in the expansion of 


P_Pi 
Q Qi 


the lowest term will be of degree 2n + 1 or Xi + Xi+i in 1 /z. Hence, the 
degree of 

PQi - PiQ 


in z is not greater than both the numbers 

Xi — n — 1 and n — Xi+] 
which are both negative while 

PQi - PiQ 

is a polynomial. Hence, identically, 

PQi - PiQ = 0 


or 

P ^Pi 
Q Qi 

which proves the statement. 

3. Continued Fraction Associated with 


JaZ - X 


Let <p{x) be a never 


decreasing function characterizing the distribution of a mass over an 
interval (a, h). The moments of this distribution up to the moment of 
the order 2n are represented by integrals 


mo 


= J^d<p{x)j . mi = J^xd<p(x)y 

m2 = j^xH<p{x), • • • m2n — J^x^^d(p(x), 
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Let 


Ao == mo; Ai 


momi 

mim2 


; A2 = 


momim2 
mim2mz ; 


An 


mom I 
mim^ 


mn 

mn+i 


lm„mn+i * • • m2n 


If <p{x) has not less than n + 1 points of increase, we must have 


Ao > 0, Ai > 0, • • • An > 0, 

and conversely, if these inequalities are satisfied, ip{x) has at least n + 1 
points of increase. To prove this, consider the quadratic form 

^ + tlX + - • • + tnX^)H(p{x) 

in n + 1 variables Uj ii, . . . tn» Evidently 

(l> == (i, j == 0, 1, 2, . . . n) 

so that An is the determinant of <l> and Ao, Ai, . . . An_i its principal 
minors. The form <f) cannot vanish unless to = ti => * • = i^n = 0. 
For if X = ? is a point of increase and <?!> = 0, we must have also 

+ * • • + tnX^Yd(p{x) = 0 

for an arbitrary positive €, whence by the mean value theorem 

(io + flT? + • * * + d(p[x) = 0(| — €<7;<^+€) 


or 


to +tnj +• * • + tnrj^ = 0 

because 

> 0. 

Letting e converge to 0, we conclude 

fo + ? + • • • + tn^'^ “ 0 

at any point of increase. Since there are at least n + 1 points of increase 
the equation 

to +tix + ' • • + tnX'^ =0 
would have at least n + 1 roots and that necessitates 


to tx • . . — 
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Hence, the quadratic form <j)j which is never negative, can vanish 
only if all its variables vanish; that is, ^ is a definite positive form. Its 
determinant An and all its principal minors An-i, An- 2 , . . . Ao must be 
positive, which proves the first statement. 

Suppose the conditions 

Ao > 0, Ai > 0, . . . An > 0 

satisfied and let <p(x) have s < n + 1 points of increase. Then the 
integral representing <f> reduces to a finite sum 

^ = Pi(to + ^1^3 4- . . . -f- + P2(to + fl^2 + • • • + + 

+ • * • + psito + -f- • • • + 

denoting by pi, p 2 , . . . Ps masses concentrated in the s points of 
increase ^ 2 , • . . Now, since s ^ n constants to, ^i, . . . tn, not 
all zero, can be determined by the system of equations 


to + + 

^0 + + 


+ in^2 “ ^ 


to + + 


+ = 0. 


Thus (i> vanishes when not all variables vanish; hence, its determinant 
An = 0, contrary to hypothesis. 

From now on we shall assume that <p(x) has at least n + 1 points of 
increase. The integral 

^U<p(x) 


rb 
J a 


can be expanded into a formal power series of 1/z, thus 


f 


d(p(x) __ mp mi f ^2 t 
^ z 


I m2n j 

»" linn "• 


and this power series can be converted into a continued fraction as 
explained in Sec. 2. Let 

Pi P 2 fn Pn+1 

Ql Q 2 Qn Qn+1 

be the first n + 1 convergents to that continued fraction. I say that the 
degrees of their denominators are, respectively, 1, 2, 3, . . . n + 1* 
Since these degrees form an increasing sequence, it suffices to show that 
there exists a convergent with the denominator of a given degree 

s ^ n + 1. 

This convergent P/Q is completely determined by the condition that in a 
formal expansion of the difference 
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_ P 

Ja S - X Q 

into a power series of 1/z, terms involving 1/z, 1/2^ . . . l/z“* are 
absent. This is the same as to say that in the expansion of 

there are no terms involving l/z, l/z2, . . . l/x*. The preceding expres- 
sion can be written thus; 


pQ(z) QMd,f,{x) - P(z) 

Ja ^ X Ja ^ ^ 




+ 


Since 


£ 


'Q(z) - Q( x) 

Z X 


d<p(x) — P(z) 


is a polynomial in it must vanish identically. That gives 

(4) pw - 

nJa ^ 


To determine Q{z) we must express the conditions that in the expansion of 


£ 


Qix)d<p(x) 


terms in l/ 2 f, . . , 1/^ vanish. These conditions are equivalent to 

s relations 


(5) J^Q(x)d<p(x) = 0, j*^xQ(x)d<p(x) = 0, • • • J^x^'^'^Q{x)d(p{x) =0, 
which in turn amount to the single requirement that 

(6) B(x)Q{x)d<p(x) =0 

for an arbitrary polynomial 6{x) of degree g s — 1. 

Conversely, if there exists a polynomial Q{z) of degree s satisfying con- 
ditions (5), and P(z) is determined by equation (4), then P(z)/Q{z) is a 
convergent whose denominator is of degree s. For then the expansion of 

r d<p(x) Pjz) 

JaZ — X Q{Z) 

lacks the terms in 1/2, l/z^, . . . \lz^. 
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Let 


Q(2) — 1 

Then equations (5) become 


WqZo -f- ”f“ ^2^2 *4- • • * ftls—ils—l Hr == 0 

mi?o + ^2^1 +• rrizh -f- • • . 4" Msls^i + m^+i = 0 

TUs^iIq 4" Mall 4" W^»+lZ2 4" • * • *4“ ^2s— 2ia—l H" ^2s-l = 

This system of linear equations determines completely the coefficients 
Iq, h, , . , la-i since its determinant A^-i > 0. 

The existence of a convergent with the denominator of degree 

s ^ n 4- 1 


being established, it follows that the denominator of the sth convergent 
Ps/Qs is exactly of degree s. The denominator Qs is determined, except 
for a constant factor, and can be presented in the form: 



\ z 

• 


mo mim2 • • 

• rria 

A,_i 

mi m^mz • • 

• m^+i 


Ma-imama+l ‘ ‘ 

• m2s-i* 


A remarkable result follows from equation (6) by taking Q = Qs and 
0 = Qa'] namely. 


(7) 

while 


J^QsQm' dipix) =0 if s 7 ^ s' 


£Qld<p{x) >0 (s ^ n). 

In the general relation 

Qs ” 1 Q«— 2 

the polynomial must be of the first degree 

qa == aaZ + iSa, 

which shows that the continued fraction associated with 

£ dv(x) 


z — X 
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has the form 


aiZ + 


a^Z + ^2 


0:32; -f- — 


The next question is, how to determine the constants and Multi- 
plying both members of the equation 

Qs = {oLs^ + Ps)Qs-i — Qb-2 (s ^ 2 ) 

by Qs-^2d<piz), integrating between limits a and h, and taking into account 
( 7 ), we get 

0 == OisJ*^zQs-^lQa—2d(p(z') Qs--2d^C^} • 

On the other hand, the highest terms in Qs-i and Qs-2 are 


aia^ 




aia2 


as-2^^' 


Hence, 


zQs ^2 = —Qs^i + ^ 

as-i 


where ^ is a polynomial of degree — 2 . Referring to equation (6), 
we have 


f zQ,-2Q.~id<piz) = — r Qt-i^<p{z) 

Ja CKs— 1 Ja 


and consequently 

( 8 ) 


CCs 

CCs-l 


£Q!_,d<p(z) 


Suppose that the following moments are given: mo, mi, . . . m2n; how 
many of the coefficients as can be found? Evidently ai = 1/mo. Fur- 
thermore, Qo = 1 and Qi is completely determined given mo and mi. 
Relation (8) determines 0:2, and Q2 will be completely determined given 
mo, mi, m2, m3. The same relation again determines as, and Q3 will be 
determined given mo, mi, . . . ms. Proceeding in the same way, we 
conclude that, given mo, mi, m2, . . . m2n, all the polynomials 


Qoj Qh Q2, 


Qn 


as well as constants 


oil, az, az, . . . On^i 
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can be determined. It is important to note that all these constants are 
positive. 

Proceeding in a similar manner, the following expression can be found 




It follows that constants 


rzQ!-id<p(z) 




/3l, I32, . . . 

are determined by our data, but not Pn+i- For if s = w + 1, the integral 

can be expressed as a linear function of mo, mi, . , . m 2 n+i with known 
coefficients. But m 2 n 4 -i is not included among our data; hence, /S^+i 
cannot be determined. 

4, Properties of Polynomials Qs. Theorem. Roots of the equation 
Qs{z) =0 (s ^ n) 

are real, simple, and contained within the interval (a, h ) . 

Proof. Let Qs{z) change its sign r < s times when z passes through 
points Zi, . Zr contained strictly within (a, h). Setting 

B{z) {z — zf){z - zf) ' ‘ ^ {z — Zr) 

the product 

Biz)Q^{z) 

does not change its sign when z increases from a to h. However, 

fyz)Q.{z)d<p(z) = 0 , 

and this necessitates that 

0(z)Qs(z) 

or Qs(^) vanishes in all points of increase of <p{z). But this is impossible, 
since by hypothesis there are at least n + 1 points of increase, whereas 
the degree s of Qs does not exceed n. Consequently, Qs{z) changes its 
sign in the interval (a, b) exactly s times and has all its roots real, simple, 
and located within (a, b). 

It follows from this theorem that the convergent 
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can be resolved into a sum of simple fractions as follows : 

An 


(9) 


Pni^) Al , JI2 
= 1 T d 


+ 


+ ■ 


Quiz) Z -- Zi Z — Z2 ' ' Z — Zn 

where Zi, 2 : 2 , .. . Zn are roots of the equation Qn{z) = 0 and in general 

A P 

^ ” QLizic) 

The right member of (9) can be expanded into power series of 1/z, the 
coeflficient of 1/z^ being 




ct — X 


By the property of convergents we must have the following equations: 

n 

Aa = mo 

a = l 
n 

^ AaZa = mi 


2) AolZ^c!'^^ = m2n-.i. 

These equations can be condensed into one, 

n 

(10) = JV(2)d^(z) 

a = l 

which should hold for any polynomial T{^ of degree — 1. 
Let us take for T{z) a polynomial of degree 2n — 2: 




Then 

T{zo) =1, =0 if ^9^0. 

and consequently, by virtue of equation (10), 

= FT 7 — > 0. 

Jo L(2 - 2 «)Q:(Zc«) J 

Thus constants J. 1 , A 2 , . . . A „ are all positive, which shows that 


APPENDIX II 


369 


has the same sign as Q'^{zh). Now in the sequence 

Q'fe), . . . Qiiz.) 

any two consecutive terms are of opposite signs. The same being true of 
the sequence 

Pn{Zl), PruiZi), . . . Pn{Zn), 

it follows that the roots of P„(2) are all simple, real, and located in the 
intervals 

(21, Z2) j (22, 23) , , , , (Zti — 1, 271). 

Finally, we shall prove the following theorem: 

Theorem. For any real x 

- QLi(^)Q„(x) 

is a positive number. 

Proof. From the relations 

Qs{z) = {a,z + |S,)Q,_i(2) - Q,-.i{z) 

Qe(x) = {aa + ^s)Qs-\{x) — Q^-iix) 

it follows that 

^ ^ 

, Q,-i{z)Q,^i.{x) — Q,-i{x)Q^i{z) 
+ 

whence, taking 5 = 1, 2, 3, . . . n and adding results, 


Qn{z)Qn-~l{x) - Qn{x)Qn-l{z) 
Z X 


n 

GLsQ,8 —\ (^) Qs— 1 (^) • 

s = l 


It Buflices now to take z = x to arrive at the identity 


Q^{x)Qn~iix) - Q'^iix)Qnix) = ^a,Qs^i{xy. 

5 = 1 

Since Qo = 1 and as > 0, it is evident that 

Qn{x)Qr,~^l{x) - Qn-l{x)Qn{x) > 0 

for every real x. 

6. Equivalent Point Distributions. If the whole mass can be con- 
centrated in a finite number of points so as to produce the same I first 
moments as a given distribution, we have an equivalent point distribu- 



370 


INTRODUCTION TO MATHEMATICAL PROBABILITY 


tion” in respect to the I first moments. In what follows we shall suppose 
that the whole mass is spread over an infinite interval — oo , co and that 
the given moments, originating in a distribution with at least n + 1 
points of increase, are 

mo, mi, m 2 , . . . m2n. 

The question is: Is it possible to find an equivalent point distribution 
where the whole mass is concentrated in n + 1 points? Let the unknown 
points be 

kl, ^2, . . . ^n-hl 

and the masses concentrated in them 


At, A2, 


. . A 


n-f-l* 


Evidently the question will be answered in the affirmative if the system 
of 2n + 1 equations 


u) 


71 + 1 

- mo 

o:=:l 

n+1 

^ = mi 

a = l 
71 + 1 

Aoc^S. = m 2 

a = l 


n+1 

2^ Acc^l^ = m2n 

a = l 

can be satisfied by real numbers |i, • . . ?n+i; ^4.1,42,... An+i, 

the last n + 1 numbers being 'positive. The number of unknowns being 
greater by one unit than the number of equations, we can introduce the 
additional requirement that one of the numbers fi, ^ 2 , . . . ^n+i should 
be equal to a given real number v. The system {A) may be replaced by 
the single requirement that the equation 

n + l 

■ ( 11 ) %AaT{^.) = fLT(x)d^(x) 

<x = l 

shall hold for any polynomial T{x) of degree ^2n. Let Q{x) he the 
polynomial of degree n + l having roots ?i, ^ 2 , . . . ^n+i and let 6{x) be 
an arbitrary polynomial of degree — 1. Then we can apply equation 
(11) to 


T{x) = 6 (x)Q{x), 


APPENDIX II 


371 


Since Q(^a) — 0, we shall have 

(12) f”j(x)Q(x)d<p(x) = 0 

for an arbitrary polynomial dix) of degree - 1. Presently we shall 
see that requirement (12) together with Q{v) = 0 determines Q(x), save 
for a constant factor if 

Qv(v) 9^ 0. 

Dividing Q{x) by Qn{x), we have identically 

Q{x) == {\x + lx)Qn{x) + Rn-.l(x) 

where Rn^i{x) is a polynomial of degree — 1. If e(x) is an arbi- 
trary polynomial of degree — 2, 

(\x + 

will be of degree — 1. Hence 

J^^O^X + fx)e(x)Qn{x)d<p(x) = 0 
by (6), and (12) shows that 

d(x)Bn-^iix)d<p{x) = 0 

for an arbitrary polynomial 6(x) of degree Sn — 2. The last req\iire- 
ment shows that Rn-i(x) differs from Qn~^i(x) by a constant factor. Since 
the highest coejficient in Q{x) is arbitrary, we can set 

Rn—li(x^ == Qn—l(x)* 

In the equation 

Q(x) = (Xx + M)Qn(x) — Qn-i(x) 

it remains to determine constants X and ju- Multiplying both members by 
Qn-^i(x)d<p(x) and integrating between — oo and oo ^ we get 

\J'^^xQ„-iQ„dip(x) = J'^^QLidtpCx) 




But 




— r Qld<p{x) = r Qt-id(p{x). 

<Xn^ — 00 • 00 

f_ „Qi-id<p(x) ^ ^ 
f_\Q^J<pix) 
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whence 
The equation 


«n+l. 


( 


Q{x) = ( an+i{x — v) 


0 = Q(z;) == {an^lV + li)Qn{v) — Qn-.l{v) 
serves to determine ix if Qniv) ^ 0. The final expression of Q(x) will be 

)s.w - 

Owing to recurrence relations 

Q2 == ioc2^ + /52 )Qi ■“ Qo] Qz = (ccz^ + ^3)^2 QiJ • * ' 

Qn ~ (,OLnX “h ^n^Qn-—! Qn—2} 

it is evident that 

Q, Qn, Qn-1, . . . Ql, Qo = 1 


in a Sturm series. For a; = — c© ^ it contains n + 1 variations and for 
X = <x) only permanences. It follows that the equation 

Q(rr) = 0 


has exactly n + 1 distinct real roots and among them v. Thus, if the 
problem is solvable, the numbers ^i, ^ 2 , . . . In+i are determined as 
roots of 

Q{x) = 0. 

Furthermore, all unknowns Aa will be positive. In fact, from equation 
(11) it follows that 


Aa = 



Q{x) 

(x ^a)Q^{^a) 



> 0. 


Now we must show that constants Aa can actually be determined so as 
to satisfy equations (A). To this end let 


Fix) 


/: 


, x — z 




■d(p{z) = an+-l{x — + 


Qn-liv) 


Pn{x) — Pn-l{x). 


Then 




Q{z)dip{z) 

X — Z 


and, on account of (12), the expansion of the right member into power 
series of 1/x lacks the terms in. 1/Xj 1/x^, . . . 1/x'^. Hence, the expan- 
sion of 



P(x) 

Q(x) 
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lacks the terms in 1/x, \/x^, . . . that is, 

■P(3^) ^ 1 j_ . . . I . 

Q(a:) X -j- . . . . 

On the other hand, resolving in simple fractions, 

P(x^ 1 t -^2 I I -4. 71+1 

Q{x) " X - x~- ^2 ‘ ‘ “k a; ~ ^n+i* 

Expanding the right member into power series of 1/x and comparing 
with the preceding expansion, we obtain the system (A). By the previous 
remark all constants Aa are positive. Thus, there exists a point distribu- 
tion in which masses concentrated in n + 1 points produce moments 
mo, mi, . . . man. One of these points v may be taken arbitrarily, with 
the condition 

Qn(p) 7^ 0 

being observed, however. 

6. Tshebysheff’s Inequalities. In a note referred to in the introduc- 
tion Tshebysheff made known certain inequalities of the utmost impor- 
tance for the theory we are concerned with. The first very ingenious 
proof of them was given by Markoff in 1884 and, by a remarkable 
coincidence, the same proof was rediscovered almost at the same time 
by Stieltjes. A few years later, Stieltjes found another totally different 
proof; and it is this second proof that we shall follow. 

Let <p{x) be a distribution function of a mass spread over the interval 
— CO ^ 00 . Supposing that a moment of the order i, 

x^d<p(x) = mi, 

exists, we shall show first that 

lim P{mo — <p(l)) = 0 
lim — = 0 

when I tends to -h • For 

^\^d<p{x) ^ pj"d<p{x) = - m 

or 


Similarly 


i^(mo — ^(0) S x^d(p(x), 
^^x^d(p{x) ^ ~ l^<p{—T) 
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or 




Now both integrals 


’^x^d(p(x) and ^^x^d<p(x) 

converge to 0 as Z tends to + co ; whence both statements follow immedi- 
ately. Integrating by parts, we have 

j*^x^d<p(x) = l^[<p(l) — mo] — tj^\<p(x) — mo]x^~'^dx 
J*^^x^d<p(x') = (— 1)^“^ZV(~0 — iJ*_^x^'^^<p(x)dXj 

whence, letting Z converge to 

Mi = jcHcp{x) = — mo]a;^“^cZx — ^x^'~^(p{x)dx. 

If the same mass mo, with the same moment m^, is spread according to 
the law characterized by the function yp{x)j we shall have 

/ OO . 80 /*0 

^xH\j/{x) = — — moja^'^^cZo; — ij ^ jc^'^^(x)dx, 

whence 

(13) ■“ = 0. 

Suppose the moments 


rrioj mi, m 2 , . . . m2n 

of the distribution characterized by (p(x) are known. Provided <p(x) 
has at least n + 1 points of increase, there exists an equivalent point 
distribution, defined in Sec. 5 and characterized by the step function 
\l/(x) which can be defined as follows: 



0 

11 

for 

— =0 < a; < 


11 

for 

V 

VII 

jP(x) 

= Ai + A2 

for 

eo 

V 

B 

VII 

\l/{x) = Ai + A2 + 

• • * -f- An 

for 

In ^ a: < ^n+l 

4 ^(x) = Ai "h A 2 + * 


for 

?n+l ^ a? < + 00 


provided roots ^ 1 , ^ 2 , • ■ • ?n+i of the equation Q{x) = 0 are arranged 
in an increasing order of magnitude. 
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Equation (13) will hold for f = 1, 2, 3, . . . 2n or, which is the 
same, the equation 

(14) ^d{x)[ip(x) — ■4/{xy\dz = 0 

will hold for an arbitrary polynomial e{x) of degree — 1. The 
function 


hix) = <p{x) — 4/{x) 

in general has ordinary discontinuities. We can prove now that Jiix), if 
not identically equal to 0 at all points of continuity, changes its sign at 
least 2n times. ^ Suppose, on the contrary, that it changes sign r < 2n 
times; namely, at the points 


Taking 

e(x) = (x - ai){x — a^) ‘ ’ (x - a^), 

equation (14) will be satisfied, while the integrand 

6{x)h{x), 

if not 0, will be of the same sign, for example, positive. Let J be any 
point of continuity of h(x). If ^ (i = 1, 2, . . . r) then h{ai) = 0 
since h{x) changes sign at If $ does not coincide with any one of the 
numbers ai, a^, * » • then for an arbitrarily small positive e we must 
have 

f^^^6(x)h(x)dx = 0 . 

But by continuity 

d{x)h(x) 

remains in the interval ($ — e, J + e) for sufficiently small e above a 
certain positive number unless h{^) = 0. Thus, if h{x) does not vanish 
at all points of continuity (in which case (p{x) and ^(x) do not differ 
essentially), it must change sign at least 2n times. Let us see now where 
the change of sign can occur. In the intervals 

— 00 , and + 00 

function f(x) is said to change sign once in (a, h) if in this interval there 
exists a point or points c such that, for instance, f(x) ^ 0 in (a, c) and f(x) ^ 0 in 
(c, 6), equality signs not holding throughout the respective intervals. The change 
of sign occurs n times if (a, h) can be divided in n intervals in which f(z) changes 
sign once. 
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(p{x) — 'ip{x) evidently cannot change sign. Within each of the intervals 


1 ; 

there can be at most one change of sign, since \p(x) remains constant 
there, and <p(x) can only increase. The sign may change also at the 
points of discontinuity of ^(x); that is, at the points {i, . ?n+i. 

Altogether, <p(x) — 4/{x) cannot change sign more than 2n 1 times 
and not less than 2n times. 

Since ^p{x) = 0 so far as and (p{^i — e) is not negative for 

positive €, we must have 

— «) — — e) ^ 0. 

Also \p{x) == mo for x > ^n+i and <p(x) ^ mo, so that 
<p(^n+l + e) — + e) go. 

At first let us suppose 

— e) — — e) > 0, (p(^n+i + e) — \p{^n+i + e) <0. 

In this case <p(x) — ^l/(x) must change sign an odd number of times; that is, 
not less than 2n + 1 times. Since this cannot happen more than 2n + 1 
times, the number of times (p{x) — 4^(x) changes its sign must be exactly 
2n + 1, These changes occur once within each interval 

and in each of the points ^i, $ 2 , . . • fn+i. When the change of sign 
occurs in the interval where \l/{x) remains constant, because (p{x) 

never decreases, we must have for sufficiently small e 

(15) - e) - - €) > 0. 

But the sign changes in passing the point therefore, 

(16) <p{^i + e) - + 6) < 0. 

The equalities 

<p(^i — e) — ^(^1 — e) =0, <p(^n+i + €) — + e) = 0 

cannot both hold for all sufficiently small e. For then there would not 
be a change of sign at and ^n+i, so that the number of changes would 
not be greater than 2n — 1 which is impossible. Therefore, let 

<p(^i — e) — ^($1 — e) == 0 and <p(^n+i + e) — + e) < 0. 

Then there will be exactly 2n changes of sign: one in each of the intervals 
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and in each of the points ^ 2 , . . . ?n+i. The inequalities (15) and 

(16) would hold for i ^ 2, but 

— e) — — e) = 0, (p(^i + e) — ^(Ji + e) < 0 

for all sufficiently small e. 

Now let 


^(?w+i + e) — + e) = 0 and <p{^i — e) — ^(Ji — e) > 0 

for all sufficiently small positive e. Then there will be exactly 2n changes 
of sign: In each of the points gi, § 2 , . . . and in each of the n intervals 


The inequalities (15) and (16) will again hold for i S n. but 

— e) — ^(J«+i — e) > 0 and ^($ 7 i+i + «) — + e) = 0 

for all sufficiently small €. Letting € converge to 0, we shall have 

<p{ii - 0 ) ^ - 0 ) 

^(^^ + 0) ^ + 0) 

for i = 1, 2, 3, . , . n + 1 in all cases. Then, since 

^(St) S — 0); ^ <p{ki + 0), 

we shall have also 


<p{k^) ^ i'iki - 0) 
cpiki) ^ i'iki + 0 )^^^ 

or, taking into consideration the definition of the function yp{x) 


i-l 


i=i 


PM 

Q'ib) 


J=1 


Pi^l) 

Q'ib) 


These are the inequalities to which Tshebysheff’s name is justly 
attached. For a particular root = v they can be written thus : 


<piv) ^ 2 

^ 1 <V 

<piv) ^ 2 


PM 

Q'iii) 

EM 


( 17 ) 



378 


INTRODUCTION TO MATHEMATICAL PROBABILITY 


with the evident meaning of the extent of summations. Another, less 
explicit, form of the same inequalities is 

Tip) S ^{v - 0) 

<pip) = ’’pip " 1 “ 0 )- 

As to P{x) and Q{x), they can be taken in the form: 

P(x) = [an+l{x — v)Q„(v) + Qn-l(v)]Pn(x) - Qniv)Pn-lix) 

Q{x) = [an+lix — v)Qn{v) + Qn-lW]Qn(x) — Qn{v)Qn-l{x) . 

Thus far we have assumed that v was different from any root of the 
equation 

Qn(x) = 0, 


but all the results hold, even if 

Q„{v) = 0. 


To prove this, we note first that when a variable v approaches a root J of 
Qn(x), one root of Q{x) (either |i or |„+i) tends to — <» or + oo, while the 
remaining n roots approach the n roots xi, Xi, . . . Xn of the equation 

Q„(x) = 0. 

If tends to negative infinity, it is easy to see that 

Pill) 

Q'ih) 

tends to 0. In this case the other quotients 

Pill) 

Q'ih) 

tend respectively to 

P„(Xl) Pn(X2) 

Q'^ixi)’ Q'^ix^y " ' ■ 

If In+i tends to positive infinity the quotients 


approach respectively 


Pih) . 
Q'ih)’ 


1 , 2 , 


n 


Pnjxi) _ j .. no 

Q'l^^iY " ’ ' ’ 


n. 


Pjln+l) 

Q'ih+i) 


while 
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tends to 0. Now take y = | - « and v = ^ + ein (17) and let the posi- 
tive number e converge to 0. Taking into account the preceding remarks, 
we find in the limit 


whence again 


<p(^ - 0) 
+ 0 ) 


> 'S^Pnixi) 

XI 

< '^ Pr.jxi) 


<pii) ^ 2 


Pnixi) 

Q'niXl) 


Xi<^ 


^(9 ^ 2 


Pn{Xl) 

Q'niXlY 




But these inequalities follow directly from (17) by taking v = 

Since 

Hv + 0) -Hv-O) 

it follows from inequalities (18) that 

0 g <p{v) - ^{v - 0) g 
On the other hand, one easily finds that 

Pjv) 1 

Q'iv) an+iQn{vy + Q'n{v)Qn-i{v) — QLl(«')Qn(«') 

But referring to the end of Sec. 4, 

n 

Qn{t>)Qn-l(v) - Qn-l(v)Q^{v) = ^0:,Qs_i(y)^ 

whence 

an+lQn(vy + Q'„iv)Qn-i{v) - Q'„^i{v)Q„iv) = Q'n+MQniv) - Q'MQn+l(v). 
Finally, 

0 S <p(v) - ^(v - 0) S - Q'MQr.+x{vy 

If (pi(v) is another distribution function with the same moments 
mo, mi, m2, . . . m2ny 
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we shall have also 


0 S Tx{v) Kv 0 ) ^ 
and as a consequence, 

(19) - ^(^)1 ^ xn(2;) 

— a very important inequality. Here for brevity we use the notation 


Xn(y) — 


1 

Qn+l^^)Qn{v) - Qn{v)Qn+M' 


7. Application to Normal Distribution. An important particular 
.case is that of a normal distribution characterized by the function 


<p(x) 





In this case it is easy to give an explicit expression of the polynomials 
Qn{x). Let 


Hn(pd) — 6'“ 


4^e- 


dx^ 


Integrating by parts, one can prove that for Z ^ 


n 


= 0. 

Hence, one may conclude that Qn(x) differs from Hn{x) by a constant 
factor. Let ' 

Qn(x^ — CnHni.x'). 

To determine we may use the relation 

Hn{x) = -2xHn-i(x) - 2{n - l)Hn- 2 ix) 

which can readily be established. Introducing polynomials Qn, this* 
relation becomes 


Hence, 


Qn(x) = -2x-^Qn^^ix) ^ 2(n - 

^n—l Cn—2 


On 1 

Cn-2 2n — 2^ 



On—l 


Since Ho(x) = Qq(x) = 1, we have co = 1; also 


ai 




Pn = 0. 
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whence Ci = — The knowledge of co and Ci together with the relation 

C »-2 


Ctj 


2n — 2 


allows determination of all members of the sequence C 2 , cz, C4, . . . 
The final expressions are as follows ; 


Cam 




2“ • 1 • 3 • 5 • • • (2m - 1) 
-1 


2^+1 • 2 • 4 • 6 • • • 2m 

From the above relation between Hn(x), Hn-i(x), Hn-^iix) and owing to 
the fact that Unix) is an even or odd polynomial, according as n is even or 
odd, one finds 

i?2m(0) =(-2)--l-3-5 • • • (2m --1), 
while another relation 

= -2nHn-iix), 

following from the definition of Hn{x), gives 

HL-i(O) = (-2)- • 1 • 3 • 5 • • • (2m - 1). 

These preliminaries being established, we shall prove now that 

^ CnC„+l(HUl(v)Hniv) - 

attains its maximum for v = 0. Let 

12(z;) = - H'(z;)i7n+i(t;). 

Then, taking into account the differential equation for polynomials 

Hn{v): 

we find that 


= 2vH'M - 2nHn(v) 


On the other hand. 


^ = 2yO - 2Hniv)Hn+i(v). 




and denoting roots of the polynomial Hn+j(v) in general by 
d Hn{v) _ ^ Hnii) 1 


dv Hn+i(v) 


(v - ly 
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Consequently 


2 = 


{V - iY 


Again 




and so 


H„(i) C _-Hn+livy^ 

— -"-CZn+lVv/ ^ vNo “ -.11 ) /.. j-\2’ 


dv 


i(t, - 1)2 


‘H;+i(5) (« - ?)^ n + l 

Roots of the polynomial Hn+i(x) being symmetrically located with 
respect to 0, we have: 


I 


\{v - 


--2 




2v 


and finally 


do ^ 

= ~-2z;^ p- 

dv n + 


(v + 0^ ey 

1 


I (V^ - ^ 2)2 


Hence 


^>0 if V < 0; 4“<0 if v > 0 

dv dv 


that is, 9.(v) attains its maximum for z; = OandxnW attains its maximum 
for ?; == 0. Referring to the above expressions of C 2 mj C 2 m+i] i?2m(0), 
ff 2 rn+i( 0 ), we find that 


X2m(0) 

X27n41 (f^) 


2-4-6 


2m 


3-5-7 - - - (2m + 1) 
2 - 4 - 6 • - • 2m 


3 • 5 - 7 - - • (2m + 1) 
In Appendix I, page 354, we find the inequality 


2-4-6 


2m 


] 


whence 


/ 

1 • 3 - 5 * - * (2m — 1) ■\/4m + 2 2 

2-4-6 - 


3-5-7 


* - 2m 
(2m + 1) 


< 


I ^ 

\4m + 2 


X»W ^ Xn(0) < 


Thus, in all cases 
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whence, by virtue of inequality (19), 

1 ^ 1 ( 1 ^) - <p{v)\ < 

Thus any distribution function (pi{v) with the moments 

1 • 3 • 5 • • • (2A; - 1) 


mo = 1, m2A-i = 0, m^k 
corresponding to 

differs from ip{v) by less than 


2k 




{k S n) 


4. 


TT 

2n 


Since this quantity tends to 0 when n increases indefinitely, we have the 
following theorem proved for the first time by Tshebysheff: 

The system of infinitely many equations 


^ to f* to ^ to 

d<p(x) =1; = 0; 

J — to t/ — 00 J — « 


x‘‘’Mcp{x) = 

1 • 3 • 5 • • • (2i: - 1) 
2’^ 


fc = 1, 2, 3, 


uniquely determines a never decreasing function <p{x) such that </>{—«}) =0; 
namely, 


<p{x) 


__ 1 p 

V^J- CO 




8. Tshebysheff -Markoff’s Fundamental Theorem. When a mass = 1 
is distributed according to the law characterized by a function F{x, X) 
depending upon a parameter X, we say that the distribution is variable. 
Notwithstanding the variability of distribution, it may happen that its 
moments remain constant. If they are equal to moments of normal 
distribution with density 

\/t 

then by the preceding theorem we have rigorously 


Fix, X) = 


. r e- 




e~^'‘du 


no matter what X is. 
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Generally moments of a variable distribution are themselves variable. 
Suppose that each one of them, when X tends to a certain limit (for 
instance co ), tends to the corresponding moment of normal distribution. 
One can foresee that under such circumstances Fix, X) will tend to 


<p{x) 





In fact, the following fundamental theorem holds: 

Fundamental Theorem. If, for a variable distribution characterized 
by the function F(x, X), 


lim r x^dF{x, X) = T e'^^Vdx; X- 

J-cc 

for any fixed fc == 0, 1, 2, 3, . . . , then 

lim F{v, X) = r e^^^dx; X — > 

V^J~ CO 


uniformly in v. 

Proof. Let 


mo, mi, m2, . . . 

be 2?^ + 1 moments corresponding to a normal distribution. They 
allow formation of the polynomials 

Qo(x), Qi{x), . . . Qn{x) and Q{x) 

and the function designated in Sec. 6 by ^(x). Similar entities cor- 
responding to the variable distribution will be specified by an asterisk. 
Since 


m% —^Mh as X CO 
and since An > 0, we shall have 

> 0 

for sufficiently large X. Then F (a;, X) will have not less than n + 1 
points of increase and the whole theory can be applied to variable dis- 
tribution. In particular, we shall have 

0 ^ <p{v) — ^|/{v — 0) ^ Xn(t^) 


( 20 ) 


QSF{v, X) - - 0) ^ X*W. 


Now Qt(a;)(s = 0, 1, 2, . . . n) and Q*(z) depend rationally upon 
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m%k= 0, 1, 2, . . . 2n); hence, without any difficulty one can see that 


Q*(^) -^Qsix); s = 0, 1, 2, . . . n 
Q*(x) Qix) 


as X 00 ; whence, 


Xn W Xn{v), 

Again 

, ^*( 1 ; — 0 ) ^ \p(v — 0 ) 

as X 00 . A few explanations are necessary to prove this. At first let 
Qn{v) 9 ^ 0. Then the polynomial Q(x) will have n + 1 roots 


< * * * < 

Since the roots of an algebraic equation vary continuously with its 
coefficients, it is evident that for sufficiently large X the equation 

Q*(a;) = 0 

will have + 1 roots: 

?! < ?! < ?1< • • • < ?n%l 

and will tend to as X — > co , In this case, it is evident that — 0) 
will tend to \l/{v — 0). If Qn(v) = 0, it may happen that ?! or tends 
respectively to — 00 or •+ 00 as X — > 00 , while the other roots tend to the 
roots 


Xif X2) • • • Xfi 

of the equation 

Qn(x) = 0. 

But the terms in — 0) corresponding to infinitely increasing roots 
tend to 0, and again 

•— 0 ) — » \l/(v — 0 ). 

Now 



Consequently, given an arbitrary positive number e, we can select n so 
large as to have 



< €. 
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Having selected n in this manner, we shall keep it fixed. Then by the 
preceding remarks a number L can be found so that 

\yl^{v - 0 ) - - 0)1 < € 

for X > L. Combining this with inequalities (20), we find 

[jPC?;, X) - <p{v)\ < 3e 

for X > L. And this proves the convergence of F{v, X) to <p{v) for a 
fixed arbitrary v. To show that the equation 

1 n 

lim F{v, X) = e~^^dx 

holds uniformly for a variable v we can follow a very simple reasoning due 
to Polya. Since (p{— <=^) = 0, ^(+co) = 1 and <p{x) is an increasing 
function, one can determine two numbers ao and an so that 

(p(x) S <p(ao) <1 for X ^ ao 

1 — (p{x) g 1 — (p(an) < ^ for X > an^ 

Next, because (p{x) is a continuous function, the interval (ao, a^) can be 
subdivided into partial intervals by inserting between ao and an points 
< U 2 < * * * < an^i so that 

0 < <p{ak+i) — (p{aic) < I 

for = 0, 1, 2, . . . — 1. By the preceding result, for all sufficiently 

large X 

F{ao, X) < |; 1 - F(a„, X) < ^ 

(U 

and 


\F(,ak, X) - <p{ak)\ < |; fc = 1, 2, . . . re - 1. 
Now consider the interval (— 00 , ao). Here for v S ao 
0 ^ F{v, \)<t. 0 < ,p{v) < i 

and 


\Fiv, X) — < €. 
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For V belonging to the interval (a^, +^) 


0^1- Fiv, x)< I 0 < 1 < y 


whence again 
Finally, let 
Then . 


1F(?;, X) - ip{v)\ < €. 

ak ^ V < ajc+i (A) = 0, 1, 2, . . . n 1). 


F{Vj X) — (p(v) ^ F(akj X) — <p(ak+i) = 

— [F{ajcj X) “ ipicik)] + Wicik) ~ ^(a^fc+i)] 
Fiv, X) - <p(v) ^ F{ak+i, X) - <p(ak) = 

= [F(afc+i, X) ““ <^(a/c+i)] + [^(aA:4-i) — <p(ak)]^ 

But 

F(afc, X) - <p(ak) > “ - <p{cik+i) > 


F(afc+i, X) - <p{aic+i) < <^(a;c+i) - <p{ak) < ^ 


2 ' 


whence 

— € < F{v, X) — (p{v) < e. 

Thus, given e, there exists a number L{e) depending upon e alone and 
such that 

|F(y, X) — <p(v)\ < € 

for \ > L(e) no matter what value is attributed to v. 

The fundamental theorem with reference to probability can be stated 
as follows: 

Let Sn be a stochastic variable depending upon a variable positive integer 
n. If the mathematical expectation E(s^) for any fixed fc = 1, 2, 3, . . . 
tends, as n increases indefinitely, to the corresponding expectation 

1 

E{x^) = — j=z I xh^^^dx 

V^r J - 00 

of a normally distributed variable, then the probability of the inequality 

Sn < V 

tends to the limit 

-X p e-^^dx 

VttJ-co 

and that uniformly in v. 
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In very many cases it is much easier to make sure that the conditions 
of this theorem are fulfilled and then, in one stroke, to pass to the limit 
theorem for probability, than to attack the problem directly. 

Application to Sums of Independent Variables 

9. Let ;Si, ;S 3 , . . . be independent variables whose number can be 
increased indefinitely. Without losing anything in generality, we may 
suppose from the beginning 

=0; /c = 1, 2, 3, . . . . 

We assume the existence of 

E{zl) = 6 , 

for all = i, 2, 3, . . . . Also, we assume for some positive 5 the 
existence of absolute moments 




= 1, 2, 3, . ... 


Liapounoff’s theorem, with which we dealt at length in Chap. XIV, 
states that the probability of the inequality 


^ 1+^2 + 


+ 


where 


\/2B„ 


Bn = + 


<t, 


+ hn 


tends uniformly to the limit 


-f 

V^J-« 




as n 00 , provided 


^( 2 + 5 ) + ^( 2 + 5 ) + 




( 2 + 5 ) 


Un 


0. 


Liapounoff^s result in regard to generality of conditions surpassed by 
far what had been established before by Tshebysheff and Markoff, whose 
proofs were based on the fundamental result derived in the preceding sec- 
tion, Since Liapounoff’s conditions do not require the existence of 
moments in an infinite number, it seemed that the method of moments 
was not powerful enough to establish the limit theorem in such a general 
form. Nevertheless, by resorting to an ingenious artifice, of which we 
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the 
limit theorem by the method of moments to the same degree of generality 
as did Liapounoff. 
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Markoff’s artifice consists in associating with the variable Zk two new 
variables Xh and defined as follows: 

Let iV* be a positive number which in the course of proof will be 
selected so as to tend to infinity together with n. Then 

Xh = Zky Vk = 0 if \zk\ g N 

Xk =0, yk = Zk if \Z]l > N, 

Evidently Zk, Xk, yk are connected by the relation 



Zi = Xk + yk 

whence 


(21) 

E{xk) + E{yk) = 0. 

Moreover 

Eixl) + E(yl) = Eizl) = h 

(22) 



E\xk\^^^ + E\yk\^^^ = E\zk\^^^ = 

as one can see immediately from the definition of Xk and yk- 
Since Xk is bounded; mathematical expectations 

Eixi) 

exist for all integer exponents Z = 1, 2, 3, . . . andforifc = 1, 2, 3, . . . . 
In the following we shall use the notations 

l^(4)l = 4"; z == 1 , 2 , 3 , . . . 

^(2) + ^(2) + . . . + ^(2) J5/ 

^(2+5) + ^(2+5) + . . . + ;,(2+5) = 

Not to obscure the essential steps of the reasoning we shall first 
establish a few preliminary results. 

Lemma 1. Let qk represent the probability that yk 0; then 

gi + ^?2 + ’ * • + ^ 

Proof. Let ipkh^ be the distribution function of 2*. Since yk 9^ 0 
only if \zk\ > N, the probability g* is not greater than 

/ — N /• oo 

_^d<phix) + d<pi(x). 

On the other hand, 
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But 


whence 


g/. g J__ ^ d<pjc(x) + dipi,{x) ^ 

The inequality to be proved follows immediately. 

Lemma 2. The following inequality holds: 

Proof. From 

which is a consequence of the second equation (22) it follows that 


The first equation (22) 


gives 


E{yf) g 


cr + Eiyl) = hk 


bk ^ ^ bk 


m ■ 


Taking the sum for fc = 1, 2, 3, . . . n, we get 

Cn 


whence 




1 > :?2 > 1 . 

- Bn = BnN^ 


Lemma 3. For e ^ 3, 

cM + + ■ ■ . 4, cU) ^ 

6 

Bl 


<(^ 
= \Bn 


) 


e — 2 
2 • 


Proof. This inequality follows immediately from the evident 
inequalities 


4 ^^ ^ E\xj,Y ^ N^^E{xl) ^ 
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Lemma 4. The following inequality holds 

Cj" + + • • • + cy ^ / (7„ Y 

Bi = VA^^+7 ■ 

Proof. Since 

E{xk) + E(yk) = 0 , 

we have 

cP = 1^(0;,) 1 = \Eiyk)\ ^ E\ykl 
On the other hand, by virtue of Schwarz^s inequality 
[E\yi\ + E\y2\ + * * * + E\yn\Y ^ 

n 

^ toi + 5^2 + • * • + Qn)^'^E(y^) ^ 


whence the statement follows immediately. 

If the variable integer N should be subject to the requirements that 
both the ratios 


Cn 


and 


^2 

Bn 


should tend to 0 when n increases indefinitely, then the preceding lemmas 
would give three important corollaries. But before stating these 
corollaries we must ascertain the possibility of selecting N as required. 
It sufiices to take 


Then 


N = 


Cn 

Bn 



0 


by virtue of Liapounoff^s condition. 
Also 



will tend to 0. By selecting N in this manner we can state the following 
corollaries : 

Corollary 1. The sum 


‘ + S'n 

tends to 0 as co . 
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Corollary 2. The raiio 

ik 

Bn 

tends to 1. 

Corollary 3. The ratio 

£ 

tends to 0 for all 'positive integer exponents e except 6=2. 

10. Let Fnif) and represent, respectively, the probabilities of the 
inequalities 

+ 2:2 + * * * + ^ ^ 

OOl + Xt Xn ^ t 

vm 

By repeating the reasoning developed in Chap. X, Sec. 8, we find that 

\Fn{t) — ^ + * * * + S'n. 

Hence, 

lim {Fn(t) — ct>n{t)) =0 as 00 

by Corollary 1. It suffices therefore to show 


071 (0 


r e--^dx 

V7rJ_« 


as 


n 00, 


and that can be done by the method of moments. By the polynomial 
theorem 


/ xi + X2 + ' ’ • + Xn Y'^ ^ ^ ml Sg,^, . . . X 

V vm: ) 2jam • • • X! I 

where the summation extends over all systems of positive integers 
oi ^ ^ X satisfying the condition 

a + + • * • + X = m 


and Sa. 0 , . . . X denotes a symmetrical function of letters Xi^ X 2 , . . . Xn 
determined by one of its terms 

. . . 4 
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if I represents the number of integers a, /3, ... X. Since variables 
XifXi,, . . . Xn are independent, we have 

^/ xx + Xj + • ■ • + a:A ”» _ m! Ga,^, . . . x 

V ) 2ia\^\ • • • X! 

where Oa,^. ... x is obtained by replacing powers of variables by mathe- 
matical expectations of these powers. It is almost evident that 

...^l < c<f + + • • - 4^ _ ef + cf + • • • + cy 

— a & 

Br? BJ BJ 

C(X) +4X)+ . . . +c^ 

X 

BJ 

Now if not all the exponents a, jS, . . . X are — 2 (which is possible 
only when m is even), by virtue of Corollary 3 the right member as well as 

Ga,B, . . . X 

m 

tends to 0. Hence 


E[ 


'xx -b 0:2 4- 






if m is odd. 

But for even m we have 


roo\ + ^2 4" ’ * * + G 2 . 2 , • • • 2 , n 

4 vm — ) -F— 

Dn 


Let us consider now (m being even) 


m 2? 

/ci» + c' 2 » + • • • - 1 - ^ r 

\B„) ■■■ ox 


. . . (jQ 


Br? 


where summation extends over all systems of positive integers 

X ^ /X ^ ^ CO 

satisfying the condition 

Air 1 ^ 

^ X + M+ * * * 4'C*>=7r 
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and „ is a symmetric function of df’, c'j®, . . . determined by 

its term 

{d^)Kc?Y . . . 

I being the number of subscripts X, /i, . . . w. Apparently 
gx,., ■ . ■ . < + • • • + . . . 


B 2 

Un 

Besides 

and 




{d^Y + {cfY +••■.+ (e)- 
5“ 


cf' S (4®)‘ ^ 

{d?Y + icfY + • • • -r (e)‘ 


Bi, 




if e > 1. Thus 


gx,>., ■■■, 

m 

BJ 


■0 


if not all subscripts n, ... oi are equal to 1. It follows that 




But by Corollary 2 

and evidently Hi,i, 


Br, 

1 = ^2,2, . • . 2. Hence 

j G2j2j • * • 2 ^ 


2 r 


B 2 

jJn 


and this in connection with (23) shows that for an even m 


El 


^xi + a;2 + 






ml 


Finally, no matter whether the exponent m is odd or even, we have 

( Xx X2 * • * + Xi!S 


limir^- 




«y ^ _i_ r " 
/ VirJ- - 


x”‘e~’^^dx. 
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Tshebysheff-MarkofE^s fundamental theorem can be applied directly 
and leads to the result: 

lim f e-^^dx 

V^rJ- 00 

uniformly in t. On the other hand, as has been established before, 

lim [Fn{t) - ^,,(0] = 0 
uniformly in t. Hence, finally 

lim Fn{t) == f e-^'^dx 

V^J-oo 

uniformly in t. 

And this is the fundamental limit theorem with Liapounoff’s condi- 
tions now proved by the method of moments. This proof, due to 
Markoff, is simple enough and of high elegance. However, preliminary 
considerations which underlie the proof of the fundamental theorem, 
though simple and elegant also, are rather long. Nevertheless, we must 
bear in mind that they are not only useful in connection with the theory 
of probability, but they have great importance in other fields of analysis. 
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ON A GAUSSIAN PROBLEM 

1. In a letter to Laplace dated January 30, 1812/ Gauss mentions a 
difficult problem in probability for which he could not find a perfectly 
satisfactory solution. We quote from his letter: 

Je me rappelle pourtaat d’un probleme curieux duquel je me suis occupd il y 
a 12 ans, mais lequel je n’ai pas rdussi alors k resoudre k ma satisfaction. Peut- 
^tre daignerez-vous en occuper quelques moments: dans ce cas je suis sur que vous 
trouverez une solution plus complete. La void: Soit M une quantity inconnue 
entre les limites 0 et 1 pour laquelle toutes les valeurs sont ou dgalement probables 
ou plus ou moins selon une loi donn^e: qu’on la suppose convertie en une fraction 
continue 


Quelle est la probability qu^en s’arretant dans le dyveloppement k un terme fini 
^Cn) la fraction suivante 


1 


1 


soit entre les limites 0 et a;? Je la designe par F(n, x) et j’ai en supposant toutes 
les valeurs dgalement probables 

P(0, x) — X. 


P{1, x) est une fonction transcendante dependant de la function 


1+1+1+ ■ ■ ■ +i 

que Euler nomme inexplicable et sur laquelle je viens de donner plusieurs re- 
cherclies dans un rndmoire presents k notre Society des Sciences qui sera bientdt 
imprime. Mais pour le cas ou n est plus grand, la valeur exacte de P(n, x) semble 
intraitable. Cependant j’ai trouve par des raisonnements tr^s simples que pour 
n infinie 


P(n, x) — 


log (1 + x) 
log 2 


1 Gauss^ Werke, X, 1, p, 371. 
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Mais les efforts que j^ai fait lors de mes recherches pour assignor 


P{n, x) - 


log (1 + x) 
log 2 


pour une valeur tr^s grande de n, mais pas infinie, ont 6t6 infructueux. 

The problem itself and the main difficulty in its solution are clearly 
indicated in this passage. The problem is difficult indeed, and no 
satisfactory solution was offered before 1928, when Professor R. 0. 
Kuzmin succeeded in solving it in a very remarkable and elegant way. 

2. Aiial 3 rtical Expression for Pn{x). We shall use the notation 
Pn{x) for the- probability which Gauss designated by P(n, x). The first 
question that presents itself is how to express Pn{x) in a proper analytical 
form. Let ^ 2 , . . . Vn, x) be an interval whose end points are 
represented by two continued fractions: 


"■ + 5 + 


and 


vi + 


n + 


+ 




+ : 


with positive integer incomplete quotients vi, ^ 2 , . • • while x is a 
positive number ^1. Two such intervals corresponding to two different 
systems of integers vi, . . . Vn and . v!^ do not overlap; 

that is, do not have common inner points. For, if they had a common 
inner point represented by an irrational number N (which we can always 
suppose), we should have for some positive x' < 1 and x" < 1 


■ + X"‘ 

But that is impossible unless = vi, v'^ = v^, . . . v'„ == v„. 

A number M being selected at random between 0 and 1 and converted 
into a continued fraction 


N =-,l 

^^+77 d. 

Vi + 




+ 


X' 



Vi + 


• + 


1 

V. + k 


if the quantity % turns out to be contained between 0 and a: < 1, M must 
belong to one (and only one) of the intervals v^, . . . d „, x) cor- 
responding to one of all the possible systems of n positive integers 
Vi, Vi, . . . Vn- Since M has a uniform distribution of probability and 
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since the length of the interval 5{vi, Vi, . . . Vn, x) is 

1 1 


(- 1 )’ 


Vl + 


Vi + 


Vl + 


Vi + 


+ 


+ 


Vn X Vn 

the required probability Pn(x) will be expressed by the sum 

'i _ i 

+_1 + J; 

W2 + • 1 Vi -\- 

■ • + 


Pn{x) = 


Vl,V2, . . , Vn 


+ ■ 


Vn + X ' Vn] 

extended over all systems of positive integers vi, Vi, . . . Vn. In general 
let 

Pi 1 \ 


Qi Vl + 


(f = 1, 2, . . . w) 


Vi + 




be a convergent to the continued fraction 

1 . 


+ 


V2 + 


+ 


Then the above expression for Pn(x) can be exhibited in a more convenient 
form: 


Pn -b XPn-1 _ ^ 

Qn "f" xQn—1 Qn_ 

> Vn 

By the very definition of Pn{x) we must have Pn(l) = 1; hence the 
important relation 

® 2q„(Q„ + Q„_0 = 

This result can also be established directly by resorting to the original 
expression of Pn(l) and performing summation first with respect to vi, 
then with respect to ^ 2 , etc. 

Relation (2) can be interpreted as follows: Let d in general be the 
length of an interval d{vi, v^, . . . 1). Then 

D5 = 1 

summation being extended over the (enumerable) set of intervals 5. 


(1) Pn{x) = 2 
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3. The Derivative of Pn{x). In attempting to show that Pn{x) 
tends uniformly to a limit function as n co it is easier to begin with its 
derivative Pn{x). Series 

1 


obtained by formal derivation of (1) is uniformly convergent in the 
interval (0, 1). For 


Qn > 


Qn "b Qin—1 


whence 


and the series 


1 2 


{Qn + xQn-l)^ QniQn + Qw-l) 


^Qn(Qn + Qn-l) 


is convergent. Hence 
dPn{x) 


dx 


Since 


we have 


Pn{x) - 2(Q„ +a:Q„_x)2* 
Qin “ ^nQn—l “I” Qn—2 


Vnix) = 2 


^ / 1 


{Vn + XY 


and, performing summation with respect to vi, , , , Vn-i for constant 

Vn 

1 


^ / 1 

U,.., . . . vn^\Qn-i + j 


V 




Pn(x) - + 

r»~l 


+ xy 


whence 
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or else 


(3) Vn{x) - + a;)(r + 

® = 1 

— an important recurrence relation which permits determining com- 
pletely the sequence of functions 

. . . 

starting with po(x) == 1. ^ ^ 

4. Discussion of a More General Recurrence Relation. In discussing 
relation (3) the fact that ^t)o(x) = 1 is of no consequence. We may start 
with any function /o (a;) subject to some natural limitations, and form a 
sequence 

. . . 

by means of the recurrence relation 

V = 1 

The following properties of fnix) follow easily from this relation: 
a. If 


Mx) = 


1 +a: 


then 


».i.2,3, ... 

For 

00 

fi(x) = «2(t, + a: “ v + x + ^ ^ TT^ 

0 = 1 

whence the general statement follows immediately. 
b. If 


^ Mx) ^ 


1 + a; 


1 +a: 


then 


m ^ f f \ ^ M 
^ fn(x) ^ 


1 X 


1 + » 
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Follows from (a) and equation (4) itself. 

As a corollary we have: Let Mn and be the precise upper and 
lower bounds of 


(1 + x)Ux) (n = 0, 1, 2, . . . ) 

in the interval 0 ^ a; ^ 1. Then 

Afo ^ Ml ^ lf2 ^ • • • 
mo ^ mi ^ m2 ^ * 

c. We'h^ve 

/.(*)& = + »)> " 

■ i " i 

d. The following relations can easily be established by mathematical 
induction: 


_ ^JPn+xPn-^\ 

1 


iQn 4" xQn-i)^ 
1 

SUX) - 

f (Pn + xPn-1 

(Q/i + xQn-iy 

) 1 

fznix) - 2j^2n\^Q^ 

f {Qn + xQn-l) ' 


Let us suppose now that the function foix) defined in the interval 

0 g a; g 1 

possesses a derivative everywhere in this interval and let jUo be an upper 
bound of \fo(x)\ while M is an upper bound of 1(1 + Then by 

property (6) 

l/.(x)l ^ M; l/2.(x)l ^ M; \fzn(x)\ ^ M, 

The function fn{x) represented by the series 
U(X) = ^foiu)^Q^ 

where u stands for 

Pn + xPn^l 
Qn Hr ^Qn-l 
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has a derivative; for the series obtained by a formal differentiation 

("“!)” t^.\ Qn-l 


m = 




'{Qn + + xQ„.ir 

is uniformly convergent and represents /'(rr). Now 

Qn—l 


and 


Hence 


(Qn + xQn-lV ^ Ql 


f)ft ^ QniQn + Qn-l) 
Q^n > 2 




Qn-l 


< + Q„_i) 


(Qn + xQn-l)^\ 

fay virtue of (2). On the other hand, the inequality 

QniQn + Qn-l) = (VnQn-l + Qn-2)[(Vn + l)Q7i-l + Qn- 2 ] > 

> 2Qn-l(Qn-l + Qn- 2 ) 

holding for n ^ 2 together with an evident inequality 

Qi(Qi + Qo) = 2 

shows that 

QniQn + Qn-l) > 2 " (u ^ 2). 

Thus 

(Q„ + xQr^i)^ > Ql -Ql > > 


> 2’‘-2Q„(Q„ + Q^i) 


and consequently 




(- 1 )” 


(Qn + xQn—^)^\ 


< 


IJ'O 

2n-2 


Hence, we may conclude that 


== 9^ + 


is an upper bound of \fn(x)\. Similarly, starting with the second equation 
in (d), we find that 
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is an upper bound of |/ 2 „(a:)|, and so forth. In general, the recurrence 
relation 


Mft - + 4M 


(k = 1, 2, 3, 


determines upper bounds of 

\fLix)\, \fL(x)\, — 
It is easy .to see that in general 


. AO , 4ilf 

^ Ok(n—^) I ' 


2Hn-2) » I __ 2-(«-2) 

SO that for sufficiently large n 

jih < 5 M. 

5. Main Inequalities. Let 

mo 


<pa(x) =/o(a;) 


1 + X 


Then 


fnix) - ^ = ^n(x) = 

^ + Q„_,y 

Since the intervals 5 defined at the end of Sec. 2 do not overlap and cover 
completely the whole interval (0, 1), we may write: 

I = <Po(x)dx = 2^£po(x)dx = + Q^,y 

the latter part following from the mean value theorem and Ui being a 
number contained within the interval 5. By subtraction we find 


fn(x) 


mo 


I > g^[yo(^) - ^o(wi)]^ 


l+X " - + Qn-l) 

and, since both and Ui belong to the same interval d, 

jLto + mo ^ jLto + mo 




Qn(Qn + Qn-l) 


> 


2 ^ 


Mx) - 


me 

1 + a: 


- I > 


/xo + mo 


Consequently, 
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and a fortiori 


It follows that 


U{x) > 


mo + 1 + OTo) 

1 + a; 


(5) mi ^ mo + Z — 2 -”(aio + Wo).* 

* 

In a similar way, considering the function 

Mx) = - h{x) 

and setting 

h = ifo^Mx)dx, 

we shall have 

, / N ^ Zl^o — Zi + 2~"(/xo + -^o) 

j ^ 

whence 

(6) Ml ^ Mo — Zi + 2“’’(/io + Mo). 

Further, from (5) and (6) 

Ml — mi ^ Mo — mo + 2 -”+Kjlio + Mo) — I — h. 

But 

Z + Zi = I log 2 ■ (Mo — mo) = (1 — fc)(Mo — mo); k < 0.66, 
so that finally 

Ml — mi < fc(Mo — mo) + 2-”+i(/io + Mo). 

Starting with /„(x), / 2 ,i(a;), . . . instead of /o(x), in a similar way we find 

M 2 — m 2 < fc(Mi — mi) + 2-"+i()ui + Mi) 

Ms — ms < fc(M2 — m 2 ) + 2-"+i(/i2 + Ms) 


Mn — rrin < fc(M„_i — m„_i) + 2 -’*+i(m„_i + M„_i). 

From these inequalities it follows that 

Mn — m„ < (Mo - mo) A;™ + 2-”+i [/loA;”-^ + + • • . + 4 . 

+ MoZb”-! + MiA:’-^ -|_ . . . 4 . 

Without losing anything in generality, we may suppose that /o(a;) is a 
positive function. Then 

* Mi, mi are used here with, the same meaning as Jkf„<, mn,- in Sec. 4. 
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Mk ^ Mo, lik < 5Mo (fc = 1, 2, 3, . . . ) 
at least for sufficiently large n. Owing to these inequalities we shall have 

(7) M„ - < (Mo - mo)fc’‘ + 


This inequality shows that sequences 


mo ^ mi ^ m2 ^ * * * 

approach a common limit a. The following method can be used to find 
the value of this limit. Let N be an arbitrary sufficiently large integer 
and n the integer defined by 


Then 


S N < iri + 1)2, 


and therefore 


rrin 

1 + x 


^ fnn(x) 


^ Mn 

-TT~x 


Vtn 

1 + ^ 


^ fnix) 


<• Mn 
~ 1 + X 


The last inequality permits presenting /j^ (a:) thus: 


(8) Mx) = + 0(M„ - m„); |0l < 1, 

whence 


J^fN(x)dx = J^Mx)dx = a log 2 + e^{Mn — m^), \B'\ < 1, 

and, because — rrin ultimately becomes as small as we please in 

absolute value, 

a log 2 = J^^Mx)dx, 

Equation (8) shows clearly that the sequence of functions 
Mx)Jiix),f2{x), . . . 

defined by the recurrence relation (4) approaches uniformly the limit 
function 


a 


1 + a; 
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where 


6. Solution of the Gaussian Problem. It sufSces to apply the preced- 
ing considerations to the case /o(a;) = p^ix) = 1. In this case Mo = 2, 
mo = 1, Mo = 0 ar^d 

1 

“ log 2 

Consequently, 

= (1 4- a:) log 2 + + (1 - • 2 ^-^’ ^ ^ 

where n = [V^]- It suffices to integrate this expression between limits 
0 and i < 1 to find 


AsiV- 


^^(0 = + (1 _ l ) 2 n -) ’ 

■> 00 

log (1 + 0 




log 2 


as stated by Gauss. Moreover, 

log (1 + 01 ^ An , 3 \ 

' \ ^ (1 - fc)2”-V 




log 2 

for sufficiently large, but finite N. 
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Arrangements, 18 
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Bayes^ formula (theorem), 61 
Bernoulli criterion, 5 
Bernoulli theorem, 96 
Bernoulli trials, 45 
Bernstein, S., inequality, 205 
Bertrand’s paradox, 251 
Buffon’s needle problem, 113, 251 
Barbier’s solution of, 253 

C 

Cantelh’s theorem, 101 
Cauchy’s distribution, 243, 275 
Characteristic function, composition of, 
275 

of distribution, 240, 264 
Coefficient, correlation, 339 
divergence, 212, 214, 216 
Combinations, 18 

Compound probabihty, theorem of, 31 
Continued fractions, 358, 361, 396 
Markoff’s method of, 52 
Continuous variables, 235 
Correlation, normal {see Normal cor- 
relation) 

Correlation coefl&cient, distribution of, 
339 

D 

Difference equations, ordinary, 75, 78 
partial, 84 

Dispersion, definition, 172 
of sums, 173 

Distribution, Cauchy's, 243, 275 
characteristic function of, 264 
of correlation coefficient, 339 


Distribution, determination of, 271 
equivalent point, 369 
general concept of, 263 
normal (Gaussian), 243 
Poisson’s, 279 
‘^Student’s,” 339 

Distribution function of probability, 
239, 263 

Divergence coefficient, empirical, 212 
Lexis’ case, 214 
Poisson’s case, 214 
theoretical, 212 
Tschuprow’s theorem, 216 

E 

Elementary errors, hypothesis of, 296 
Ellipses of equal probability, 311, 328 
Estimation of error term, 295 
Euler’s summation formula, 177, 201, 
303, 347 

Events, compound, 29 
contingent, 3 
dependent, 33 
equally likely, 4, 5, 7 
exhaustive, 6 
future, 65 
incompatible, 37 
independent, 32, 33 
mutually exclusive, 6, 27 
opposite, 29 

Expectation, mathematical, 161 
of a product, 171 
of a sum, 165 

P 

Factorials, 349 
Fourier theorem, 241 
French lottery, 19, 108 
Frequency, 96 

Fundamental lemma (see Limit theorem) 
Fundamental theorem (see Tshebysheff- 
Markoff theorem) 
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G 

Gaussian distribution, 243 
Gaussian problem, 396 
Generating function of probabilities, 
47, 78, 85, 89, 93> 94 

H 

Hermite polynomials, 72 
Hypothesis of elementary errors, 296 

I 

Independence, definition of, 32, 33 

K 

Khintchine {see Law of large numbers) 
Kolmogoroff {see Law of large numbers; 
Strong law of large numbers) 

L 

Lagrange^s series, 84, 150 
Laplace-Liapounoff {see Limit theorem) 
Laplace’s problem, 255 
Laurent’s series, 87, 148 
Law of large numbers, generalization 
by Markoff, 191 

for identical variables (Khintchine), 
195 

Kolmogoroff’s lemma, 201 
theorem, 185 
Tshebysheff’s lemma, 182 
Law of repeated logarithm, 204 
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Liapounoff condition {see Limit theorem) 
Liapounoff inequality, 265 
Limit theorem, Bernoullian case, 131 
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323, 325, 326 
fundamental lemma, 284 
Laplace-Liapounoff, 284 
Line of regression, 314 
Lottery, French {see French lottery) 
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Marbe’s problem, 231 
Markoff’s theorem, infinite dispersion, 
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Markoff’s theorem, for simple chains, 301 
Markoff-Tshebysheff theorem {see 
Tshebysheff-Markoff theorem) 
Mathematical expectation, definition of, 
161 

of a product, 171 
of a sum, 165 

Mathematical probability, definition of, 
6 * 

Moments, absolute, 240, 264 
inequalities for, 264 
method of (Markoff’s), 356#. 

N 

Normal correlation, 313 
origin of, 327 

Normal distribution, Gaussian, 243 
two-dimensional, 308 

P 

Pearson’s '‘x^-test, ” 327 
Permutations, 18 
Point, of continuity, 261, 356 
of increase, 262, 356 
Poisson series, 182, 293 
Poisson’s case, 214 
Poisson’s distribution, 279 
Poisson’s formula, 137 
Poisson’s theorem, 208, 294 
Polynomials, Hermite {see Hermite) 
Probability, approximate evaluation of, 
by Markoff’s method, 52 
compound, 29, 31 
conditional, 33 
definition (classical) of, 6 
total, 27, 28 

Probability integral, 128 
table of, 407 

K 

Relative frequency, 96 
Runs, problem of, 77 

S 

Simple chains, 74, 223, 297 
Markoff’s theorem for, SQl 
Standard deviation, 173 
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Stieltjes’ integrals, 261 
Stirling’s formula, 349 
Stochastic variables, 161 
Strong law of large numbers (Kolmo- 
goroff), 202 

‘^Student’s” distribution, 339 
T 

Table of probability integral, 407 
Tests of significance, 331 
Total probability, theorem of, 27, 28 
Trials, dependent, independent, repeated, 
44, 45 


Tschuprow (see Divergence coefficient) 
Tshebysheff- Markoff theorem, funda- 
mental, 304, 384 
application, 388 
Tshebysheff ’s inequalities, 373 
Tshebysheff’s inequality, 204 
Tshebysheff ’s lemma, 182 
Tshebysheff ’s problem, 199 

V 

Variables, continuous, 235 
independent, 171 
stochastic, 161 
Vectors (see Limit theorem) 



